A Lesson on Sacrificial Rituals with VR Intro

A small intro to René Girard’s philosophy of violence and sacred. This piece is an introduction of the lesson about sacrificial rituals in connection with its social function and other types of…

Smartphone

独家优惠奖金 100% 高达 1 BTC + 180 免费旋转




Medium Blog Recommendation

Before we start we need to answer one very important question that is the very basis of this article.

“ Why do we need a recommender system for Blogs? ”

Recommender systems are an important class of machine learning algorithms that offer “relevant” suggestions to users. Netflix, YouTube, Tinder, and Amazon are all examples of recommender systems.

Machine learning algorithms in recommender systems are typically classified into two categories — content-based and collaborative filtering methods although modern recommenders combine both approaches. Content-based methods are based on the similarity of item attributes and collaborative methods calculate similarity from interactions.

To begin with, let’s start with the understanding of why medium needs a recommendation system for its blog.

and this number is increasing day by day. So for recommending the right stories to the right user is important to give the best experience to its reader.

Just like, the product recommendation system is there to help increase the revenue of the company, similarly, the Blog Recommendation system is to increase the popularity of the blog and give readers a good experience. In the following section, we will be considering the features that are available in the Medium platform to design our methodology.

As we understand that data is the most important raw material for building any machine learning model. So we also need to collect some data to proceed with the process of building a blog recommender system. Medium has data of each of their users either they are readers or writers. From that, we can extract some features that will serve our purpose.

Then there is a need to prepare these features according to our needs. So let’s start with the feature engineering part.

While building any machine learning model selecting the right set of features is a must and so with recommender systems. So, after collecting the User data, Content data, and User Content interaction we explored all the features and also engineered some new features by combining them.

We came up with the following features:

The title of the blog is an important factor on the medium which enables readers to find the blogs and makes them want to click through to read more. Creating headlines that catch visitors’ attention and spark their curiosity will encourage them to stick around longer and come back for more.

Blog Title

We can extract the titles of each blog post and by applying relevant NLP techniques we can vectorize this feature and will further use it in our recommendation system.

When it comes to getting the blogs noticed, tags are second most important after the title of the blog. Picking the right tags can make the difference between an article getting popular, and an article dying upon delivery. People generally write topic tags with respect to the content they have written and in medium, it is limited to 5 tags.

Tags

Tag is an important feature that can be used in this recommendation system for grouping articles w.r.t their respective tags.

Claps and comments with respect to each and every blog reflect the Readers’ engagement with that blog and it basically reflects the popularity of a blog. We can use these two features to create a new feature called Readers reaction which will be the combination of these two features.

Claps and Comments

Read time and member reading time are two important features of medium blogs.

Approximate Read Time

We can combine these two features to generate a new feature called Readers engagement which is the ratio of average read time and Read time.

On medium, one can also see this trending on medium blog posts column. These recommendations completely based on claps, followers, views, average read time from which user-blog interaction scores can be derived and top6 blogs are recommended.

This recommendation doesn’t depend on the content user usually interacts with and the topics user follows. This is a general recommendation for everyone based on the trending blogs.

Popularity based Recommendation

These recommendations are different for every reader based on their interactions with the blogs on medium. We have broadly two types in this.

The first one is collaborative filtering. It is a method for obtaining automatic predictions (filtering) about a user’s interests by collecting preferences or taste information from a large number of users (collaborating). Finding patterns among multiple readers is what this entails in the context of the media.

If a group of articles excites the attention of multiple readers, it’s highly probable that a reader who begins reading one of these pieces will want to read the others in the group. As a result, suggestions are offered to comparable individuals based on the reading behavior of other readers.

Memory-based:

This approach computes user similarities based on blogs they’ve interacted with (user-based approach) or computes blog similarities based on the users who’ve interacted with them (blog-based approach) (item-based approach).

Model-based :

This algorithm utilizes a user-item matrix with strengths calculated from claps, comments, and read time: expected time. Blogs can be recommended to a specific user based on the strengths of the blog.

The other one is a content-based recommendation engine. They are different in that they provide recommendations based on blog articles and the words included within them (mostly tags). If a person reads an article that includes the terms Machine learning and Data Science, it’s likely that the same user enjoys reading additional articles that include these terms.

Three other relevant articles are provided to the user as suggestions for continued reading in the medium for continuous reading for each article. The suggested articles were recently published by the same publication and contain any of the tags from the user’s current article.

Personalized Recommendation

Collaborative filtering examines users’ reading habits rather than the content of the articles themselves. So, if users who read blog entries about recommendation engines also read posts about other ML algorithms, even if the content of the other articles is substantially different, collaborative filtering will promote additional articles to these users.

Collaborative filtering has a drawback in that it requires a large amount of past user reading behavior data to detect these patterns. Content-based recommendations can be made with little to no previous data, making them simpler to deploy. We can see that this is a hybrid technique that uses both collaborative and content-based filtering in the medium.

Medium utilizes a hybrid approach

There are 2 primary ways of checking whether your recommendation system is performing as per expectations.

Online metrics are the empirical results observed in your user’s interactions with real-time recommendations provided in a live environment. The most effective way to do this is by performing an A/B test. In a live environment, you have a Control: your existing system and the Version is your recommender system under test. This is because user behavior is the ultimate test of our work.

Then why use offline techniques for evaluation?

Because they are the ONLY indicators you can look at while developing your recommendation systems. Always preferring to go with online metrics to collect user behavior and scoring your system is expensive and time-consuming. Moreover, when continuous feedback is asked from users, they might become more hesitant to use our platform and not use it at all. Good accuracies in offline metrics followed by good online A/B scores are what you will be looking for.

Accuracies in the above methods depend on historical data and try to predict what actual users have already seen. If the data collected is too old, however high the accuracies maybe, they won’t mean anything as your interests a year back will not be as same as your interests a year from now!

Some of the offline techniques for evaluation are as follows:

RMSE is similar to MAE but the only difference is that the absolute value of the residual(see above image) is squared and the square root of the whole term is taken for comparison.

The advantage of using RMSE over MAE is that it penalizes the term more when the error is high. (Note that RMSE is always greater than MAE)

RMSE Score Formula

Hit Rate is a better alternative to MAE or RMSE. To measure a Hit Rate, we first generate top N recommendations for all the users in our test data set. If generated top N recommendations contain something that users rated — 1 hit! There are various versions of this one being the cumulative hit rate.

A recommender system typically produces an ordered list of recommendations for each user in the test set. MAP@K gives insight into how relevant the list of recommended items is, whereas MAR@K gives insight into how well the recommender is able to recall all the items the user has rated positively in the test set.

Other important metrics include Coverage which is the percent of items in the training data the model is able to recommend on a test set. Personalization uses dissimilarity (1- cosine similarity) between user’s lists of recommendations. The higher the score, the higher is the dissimilarity meaning it is giving more personalization.

If we can’t make a machine learning model work in production, then that model is of no use. It’s like a written blog that never got published. So, that’s why the deployment of the model is one of the most important steps to get benefit out of the developed model. Hence, we also need to deploy the recommender system after testing and tuning of the model is done.

There are few features that makes difference in the deployment of the recommender system, i.e. Scalability of the system, Latency of the system, Offline system or Online system, etc. Let’s have a look at a few of these features. The scalability of the model defines how the model will respond when the number of users and the amount of content increases. Scalability problems have significantly increased with the rapid growth of the e-commerce industry: modern recommendation engines are required to generate real-time results for large-scale applications. In other words, the performance of the recommendation model is measured in terms of throughput (number of inferences per second) and latency (time for each inference).

Factors most important during production

We have discussed the complete methodology to design a Blog Recommendation System in this article. There can be other improvements and modifications that can be done in this design methodology. We have mentioned a few of the improvements that can be done on this system in the future plans section of this blog. In further blogs, we are planning to integrate deep learning, computer vision, and other advanced techniques to improve the efficiency of the Blog Recommendation System.

Add a comment

Related posts:

Get Angry

Letting off steam when you get angry is good for you. It reduces your stress levels and promotes good health.