Arduino With Blynk

sebelum kita belajar bagaimana sih kalau mau belajar IoT, alangkah baiknya kita harus tau apa itu Internet of Things tenang guys, kita bisa menginstall software Arduino di laptop kita sebagai code…

独家优惠奖金 100% 高达 1 BTC + 180 免费旋转

Predicting Users Churn on Sparkify

Sparkify is a subscription-based music application such as Spotify or Pandora, as part of the Udacity Data Scientist Nanodegree program, we want to analyze and clean the data in order to implement a model to predict if a user downgrades their subscription or leaves the platform.

The methodology that we will use to predict Customer churn is roughly described as follows:

The data used in this project is an extract of 2 months logs from the platform, that contains information about users activity on Sparkify, 18 columns, and 286500 records

Now, since we have a brief overview of our data, I performed some EDA to understand our data better:

Looks like male users are more prone to churn that female user around 25% of males users churn.

The paid users that churn are almost double of the free users that leave the platform.

As we see in the chart below, the female churned users listen in average around 75 songs per session and the males churned 50, in both cases the average is less than the user that haven’t churn

After the Exploratory Data Analysis, we found some features that might be useful for model implementation

Features and labels to feed the models:

The training and test sets are created splitting the data into 70% training and 30% test.

We are going to compare the performance of these 3 models “Logistic Regression”, “Random Forest” and “SVM” for predicting customer churn, due to the bias on the churned and non-churned user count, the accuracy is not a representative metric, therefore we are going to use the F1-Score as an Evaluation Metric.

The performance of all these three models on the validation data is as follows:

From the F1- Scores I possible to conclude that the Random Forest Classifier performed the best in predicting Customer Churn.

We implemented a model trying to predict customer churn. We removed rows with no userId, converted gender, and level to a binary numeric column, 6 features were engineered for our model.

We selected 3 models: logistic regression, Random Forest & SVM based on our Knowledge. We used cross-validation and grid search to fine-tune our model. We achieved ‘0.71’ as an F1 score for our model with Random Forest.

We also used it to drive the important features that may have led to Customer churn. By identifying customers with high churn chance companies can target and retain them with attractive offers/incentives. Also, this project gave a good exposure to spark environment to analyze a large volume of data.

Arduino With Blynk

Predicting Users Churn on Sparkify

Add a comment

Related posts:

1.5M Euro Fresh Capital for our Vision of Revolutionizing Private Trading with a Platform for Trading Bots

Education for Underprivileged Kids

8 steps to turn imperative JavaScript class to a functional declarative code