Arduino With Blynk

sebelum kita belajar bagaimana sih kalau mau belajar IoT, alangkah baiknya kita harus tau apa itu Internet of Things tenang guys, kita bisa menginstall software Arduino di laptop kita sebagai code…

Smartphone

独家优惠奖金 100% 高达 1 BTC + 180 免费旋转




Predicting Users Churn on Sparkify

Sparkify is a subscription-based music application such as Spotify or Pandora, as part of the Udacity Data Scientist Nanodegree program, we want to analyze and clean the data in order to implement a model to predict if a user downgrades their subscription or leaves the platform.

The methodology that we will use to predict Customer churn is roughly described as follows:

The data used in this project is an extract of 2 months logs from the platform, that contains information about users activity on Sparkify, 18 columns, and 286500 records

Data Set Schema

Now, since we have a brief overview of our data, I performed some EDA to understand our data better:

Looks like male users are more prone to churn that female user around 25% of males users churn.

Churn Count per Level

The paid users that churn are almost double of the free users that leave the platform.

As we see in the chart below, the female churned users listen in average around 75 songs per session and the males churned 50, in both cases the average is less than the user that haven’t churn

After the Exploratory Data Analysis, we found some features that might be useful for model implementation

Features and labels to feed the models:

Combining features and labels

The training and test sets are created splitting the data into 70% training and 30% test.

We are going to compare the performance of these 3 models “Logistic Regression”, “Random Forest” and “SVM” for predicting customer churn, due to the bias on the churned and non-churned user count, the accuracy is not a representative metric, therefore we are going to use the F1-Score as an Evaluation Metric.

F1-Score per model

The performance of all these three models on the validation data is as follows:

From the F1- Scores I possible to conclude that the Random Forest Classifier performed the best in predicting Customer Churn.

We implemented a model trying to predict customer churn. We removed rows with no userId, converted gender, and level to a binary numeric column, 6 features were engineered for our model.

We selected 3 models: logistic regression, Random Forest & SVM based on our Knowledge. We used cross-validation and grid search to fine-tune our model. We achieved ‘0.71’ as an F1 score for our model with Random Forest.

We also used it to drive the important features that may have led to Customer churn. By identifying customers with high churn chance companies can target and retain them with attractive offers/incentives. Also, this project gave a good exposure to spark environment to analyze a large volume of data.

Add a comment

Related posts:

1.5M Euro Fresh Capital for our Vision of Revolutionizing Private Trading with a Platform for Trading Bots

To become the go-to for automated trading, we are investing into proprietary machine learning tools for our bot creators. Now we are happy to announce the closing of a funding round with local and…

Education for Underprivileged Kids

Poverty has been a major hindrance to the path of education and the essential necessities of underprivileged kids. According to the census, 40% of India’s population is below the age of 18 years…

8 steps to turn imperative JavaScript class to a functional declarative code

As an example we will take JavaScript code to create event manager adding event listeners and dispatching events. Since JavaScript is a multiparadigm programming language, my intent is to show you…