sebelum kita belajar bagaimana sih kalau mau belajar IoT, alangkah baiknya kita harus tau apa itu Internet of Things tenang guys, kita bisa menginstall software Arduino di laptop kita sebagai code…
Sparkify is a subscription-based music application such as Spotify or Pandora, as part of the Udacity Data Scientist Nanodegree program, we want to analyze and clean the data in order to implement a model to predict if a user downgrades their subscription or leaves the platform.
The methodology that we will use to predict Customer churn is roughly described as follows:
The data used in this project is an extract of 2 months logs from the platform, that contains information about users activity on Sparkify, 18 columns, and 286500 records
Now, since we have a brief overview of our data, I performed some EDA to understand our data better:
Looks like male users are more prone to churn that female user around 25% of males users churn.
The paid users that churn are almost double of the free users that leave the platform.
As we see in the chart below, the female churned users listen in average around 75 songs per session and the males churned 50, in both cases the average is less than the user that haven’t churn
After the Exploratory Data Analysis, we found some features that might be useful for model implementation
Features and labels to feed the models:
The training and test sets are created splitting the data into 70% training and 30% test.
We are going to compare the performance of these 3 models “Logistic Regression”, “Random Forest” and “SVM” for predicting customer churn, due to the bias on the churned and non-churned user count, the accuracy is not a representative metric, therefore we are going to use the F1-Score as an Evaluation Metric.
The performance of all these three models on the validation data is as follows:
From the F1- Scores I possible to conclude that the Random Forest Classifier performed the best in predicting Customer Churn.
We implemented a model trying to predict customer churn. We removed rows with no userId, converted gender, and level to a binary numeric column, 6 features were engineered for our model.
We selected 3 models: logistic regression, Random Forest & SVM based on our Knowledge. We used cross-validation and grid search to fine-tune our model. We achieved ‘0.71’ as an F1 score for our model with Random Forest.
We also used it to drive the important features that may have led to Customer churn. By identifying customers with high churn chance companies can target and retain them with attractive offers/incentives. Also, this project gave a good exposure to spark environment to analyze a large volume of data.
To become the go-to for automated trading, we are investing into proprietary machine learning tools for our bot creators. Now we are happy to announce the closing of a funding round with local and…
Poverty has been a major hindrance to the path of education and the essential necessities of underprivileged kids. According to the census, 40% of India’s population is below the age of 18 years…
As an example we will take JavaScript code to create event manager adding event listeners and dispatching events. Since JavaScript is a multiparadigm programming language, my intent is to show you…