Machine Learning for Building Recommender System in Python

Recommender systems are widely used in product recommendations such as recommendations of music, movies, books, news, research articles, restaurants, etc. [1][5].

There are two popular methods for building recommender systems:

collaborative filtering [3][4][5]
Content-based filtering [6]

The collaborative filtering method [5] predicts (filters) the interests of a user on a product by collecting preferences information from many other users (collaborating). The assumption behind the collaborative filtering method is that if a person P1 has the same opinion as another person P2 on an issue, P1 is more likely to share P2’s opinion on a different issue than that of a randomly chosen person [5].

Content-based filtering method [6] utilizes product features/attributes to recommend other products similar to what the user likes, based on other users’ previous actions or explicit feedback such as rating on products.

A recommender system may use either or both of these two methods.

In this article, I use the Kaggle Netflix prize data [2] to demonstrate how to use model-based collaborative filtering method to build a recommender system in Python.

The rest of the article is arranged as follows:

Overview of collaborative filtering
Build recommender system in Python
Summary

1. Overview of Collaborative Filtering

As described in [5], the main idea behind collaborative filtering is that one person often gets the best recommendations from another with similar interests. Collaborative filtering uses various techniques to match people with similar interests and make recommendations based on shared interests.

The high-level workflow of a collaborative filtering system can be described as follows:

A user rates items (e.g., movies, books) to express his or her preferences on the items
The system treats the ratings as an approximate representation of the user’s interest in items
The system matches this user’s ratings with other users’ ratings and finds the people with the most similar ratings
The system recommends items that the similar users have rated highly but not yet being rated by this user

Typically a collaborative filtering system recommends products to a given user in two steps [5]:

Step 1: Look for people who share the same rating patterns with the given user
Step 2: Use the ratings from the people found in step 1 to calculate a prediction of a rating by the given user on a product

This is called user-based collaborative filtering. One specific implementation of this method is the user-based Nearest Neighbor algorithm.

As an alternative, item-based collaborative filtering (e.g., users who are interested in x also interested in y) works in an item-centric manner:

Step 1: Build an item-item matrix of the rating relationships between pairs of items
Step 2: Predict the rating of the current user on a product by examining the matrix and matching that user’s rating data

There are two types of collaborative filtering system:

Model-based
Memory-based

In a model-based system, we develop models using different machine learning algorithms to predict users’ rating of unrated items [5]. There are many model-based collaborative filtering algorithms such as singular value decomposition (SVD), Bayesian networks, clustering models, etc.[5].

A memory-based system uses users’ rating data to compute the similarity between users or items. Typical examples of this type of systems are neighbourhood-based method and item-based/user-based top-N recommendations [5].

This article describes how to build a model-based collaborative filtering system using the SVD model.

#data-science #machine-learning

1. Overview of Collaborative Filtering

towardsdatascience.com

Machine Learning for Building Recommender System in Python