In the last blog, we discussed various algorithms for building a recommendation system. But how to design and fit various other business parameters into account is where the challenge lies.
Heuristic Solution
Although machine learning (ML) is commonly used in building recommendation systems, it doesn’t mean it’s the only solution. There are many ways to build a recommendation system. Simpler approaches, for example, we may have very little data, or we may want to build a minimal solution fast, etc.
In such cases, we can start with some heuristic solutions. In fact, there are lots of hacks we can do to build a simple recommendation system. For instance, based on apps a user has used, we can simply suggest apps from the same category. We can also suggest apps with similar titles or labels. If we use the popularity (number of installs, usage) as another signal, the recommendation system can work pretty well as a baseline.
Ways of Recommending
Depending upon the app developers’ business and product where we want to leverage the recommendation system, we may build a single system that powers across the product or use multiple ways of recommending in different placeholders. If the product has multiple placeholders with a specific meaning to the business then the type of recommendation plays an important role. For example, in a short video app finding the most relevant app while a user is surfing calls for a single recommendation system. Whereas, when you are in a shopping app there are different stages of making a decision and hence the type of recommendation should also vary.
In the case of a single recommendation system powering the app, a sequence-based recommendation system can be a good choice since it takes the recently ordered events a user has provided and hence brings diversity to the recommendations. Whereas in the case of multiple types of recommendations going across the app with simple algorithms can also work well, it totally depends on the stage at which the user is in the app.
When you are on the product page showing the most similar product makes sense as this will help streamline the user’s decisions. Showing a product not similar might confuse the user. Here we can use some similarity-based algorithms or content-based filtering. Another way of recommending on the product page is showing complementary products to it which will help in cross-selling. A frequent mining set approach like market basket analysis will help.
Every user has a distinctive taste, some like to explore the same product to what they are already using and some like to explore new products which similar users are already showing interest. A hybrid approach that combines both user-based filtering and item-based filtering techniques. Such an approach will enable us to overcome the limits of content-based and collaborative filtering, leveraging the advantages of both techniques.
Freshness
Freshness can be a very important factor. We should figure out how to recommend fresh content. It has been observed that the relevance of the product to the user comes with an expiry. A product relevant now might not be relevant the next day. If a user is interested in the traveling app today, might not be interested in it tomorrow since the purpose is fulfilled.
In the case of sequence-based or session-based recommendations, freshness is taken care of but if we are using any other type of recommendation we should keep on adjusting/ re-ordering based on the recent activity of the user. One of the ways can be to generate candidates (items) in batch recommendations and re-rank the filtered items for a user based on another recommendation algorithm.
Handling the Corner Cases
The most common problem with the recommendation systems is the cold start problem i.e. when we have a new user or a new item. For new users, we can opt for popularity-based models which work pretty well and to make the recommendations more specific to show popular items with the user cohort.
In case of a new item, map it to the most similar item and show it in place of it sometimes to make the discoverability better. With time, the item will start getting picked up by the recommendation system.
Feature Engineering
Usually, there are two types of features – explicit and implicit features. Explicit features can be ratings, favorites, etc. In Youtube, it can be the like/share/subscribe actions. Implicit features are less obvious. If a user has watched a video for only a couple of seconds, probably it’s a negative sign. Given a list of recommended videos, if a user clicks one over another, it can mean that he prefers the one clicked. Usually, we need to explore a lot about implicit features.
There are several features that are quite obvious:
- Like/share/subscribe – As mentioned above, they are strong signs about a user’s preferences.
- Watch time
- Video title/labels/categories
- Freshness
- Purchase history
It’s worth noting that when building machine learning systems, you have to experiment a lot with different combinations of features so that you won’t know which one is good unless you give it a try.
Infrastructure
Given that comparing similar users/videos can be time-consuming in the app, this part should be done in offline pipelines. Therefore, we can divide the whole system online and offline.
For the offline part, all the user models and products need to store in distributed systems. Pipelines that calculate similar users/products are also running regularly in order to keep data updated. In fact, for most machine learning systems, it’s common to use the offline pipeline to process big data as you won’t expect it to finish with few seconds.
For the online part, based on the user profile and his actions (like products just watched), we should be able to provide a list of recommended products from offline data. Normally, the recommendation engine fetches more products than needed and then does filtering and ranking on the fly. We can filter products that are obviously irrelevant like products the user has seen. And then we should also rank the suggestions. Few factors that should be considered include product popularity (share/comment/like numbers), freshness, quality, and so on.
Evaluation
Evaluation is an essential component of the recommendation system, which allows us to understand how well the system works. No single solution is best and the algorithm degrades with changes in user behavior. Using reinforcement learning is one way but still needs an eye.
Always do the A/B/N test between the challenger and defender strategy and pick the one that performed better. Post that, we should keep logging the model drift and data patterns and accordingly decide the need to re-train or re-visit the algorithm. Also, not to forget user feedback to be supplied to the strategy or ML model to avoid irrelevant recommendations.