Anime recommendations by using Collaborative Filtering

This study aims to recommend animes to people by using myanimelist user ratings. The recommendation method is Frequent Pattern Mining, the used tool is Apache Spark. For data preprocessing, Ptyhon-Pandas library is used via jupyter notebook. Animes are not as popular as tv series or movies. So, finding good recommendations is more difficult. I hope my findings can help someone :)

Data Selection

Firstly, rating.csv which includes myanimelist user scores is selected. The columns are:

· user_id — non identifiable randomly generated user id.

· anime_id — the anime that user has rated.

· rating — rating out of 10 the user has assigned (-1 if the user watched it but didn’t assign a rating).

Secondly, anime.csv is selected for getting anime type like TV, Movie, OVA from the same source with rating.csv. Finally, the related column is selected from AnimeList.csv. Since all of the data are fetched from myanimelist, they can be joined to each other by _anime_id _attribute.

Data Preprocessing

In this part, data is prepared for rule mining algorithm. Lower ratings and unimportant types are dropped, season data are merged.

A. Binning

First of all, it is not proper to consider lower ratings for making recommendations.

Image for post

Fig. 1. The histogram that shows rating distribution

For deciding, above histogram is drawn. “-1” value is used if the user didn’t prefer to give a rating to an anime. But it doesn’t mean user didn’t like it. Because people preferred to rate anime if they like too much. So, “-1” values are considered. By ignoring “-1”, the mean rating value is 7.80. Among these too high rating points, 0–5 points can be ignored. To summarize, 6–10 and -1 points are considered, 0–5 points are counted as dislike.

B. Type Filtering

Some kind of anime types consists of several episodes which include side stories about main animes. They must not be considered for rule mining. So, OVA, ONA, Music and Special animes must be dropped. Type data and rating data joined, except TV or Movie animes, all data is removed. It is also seen that most of the dropped animes didn’t rated. As a result, the unrated animes became more valuable than before.

C. Season data merging

Each different season of an anime has its own anime_id. As an example, there five seasons and anime_ids for Sailor Moon.

  • Sailor Moon -530
  • Sailor Moon R -740
  • Sailor Moon S -532
  • Sailor Moon Super S -1239
  • Sailor Moon Sailor Stars -996

Anime recommendations by using Collaborative Filtering
Item-Based Collaborative Filtering in Python

The practice of making the item-based collaborative filtering in python.

Item-based collaborative filtering  is the recommendation system to use the similarity between items using the ratings by users. In this article, I explain its basic concept and practice how to make the item-based collaborative filtering using Python.

Basic Concept

Making a Movie Recommender

Recommendation System: Collaborative Filtering

This article contains detailed implementation steps of Collaborative Filtering in python without any external libraries from scratch.

As the name suggests, this is a part 2 of the Recommendation System article where part 1 focuses over content based recommendation system, this article will focus over collaborative filtering approach i.e. Harnessing quality judgments of other users.

_The main idea in collaborative filtering revolves around predicting the rating of an item for user __X _based on the ratings given by a set of similar users.

Let us try to understand this definition, here ‘similar users’ refer to a set of users that have similar likeness and dis-likeness as user X’s. So, for example if user x has disliked an item a then the similar users must also dislike the item _a _and vice versa. Although, the strength of their similarity depends upon the ratings provided by the users. Fig 2 describes this process where we are trying to predict likable items for Mr. A.

Image for post

Fig. 2

Once we have a set of users that have rated the items in a similar way as of user X then we can start predicting the ratings for the items that have yet not been used by the user _X _and the items with highest ratings will be recommended to the user X. This approach is also known as user-user based collaborative filtering as we are matching user profiles and not item profile in that case it would be item-item based collaborative filtering.

Sounds simple! Well let’s try to implement it.

Data Set:

I am using the same data set as used in part 1 i.e. anime dataset from Kaggle. The data-set contains two files rating.csv having user’s rating for different anime so total 3 columns and anime.csv which is containing details for all the anime like name, type, average ratings, etc. There are total 12,294 unique anime, 73,516 unique users and 7,813,737 total ratings.

In this approach we will mostly use the rating.csv file i.e. the file containing ratings given by each user to some anime. There are missing values -1, for user indicating that the user has watched this anime but has not rated it. The global average rating is 7.8.

Implementation Steps:

The implementation is mainly divided into 3 tasks:

**Task 1: **To calculate a set of similar users as of user X. And for calculating the similarity between two users we have used Pearson Correlation Coefficient between user x with rest of all users. Once done, it will return a list of N most similar users.

Image for post

Fig. 2 (Pearson Correlation Coefficient)

**Task 2: **After getting a list of similar users as user X we can predict the ratings of the anime that the user _X _has not watched but similar users from the set N have watched.

Collaborative Filtering on Anime Dataset using fastai2

The post aims to describe what Collaborative Filtering (henceforth abbreviated as CF throughout the length of this post) is all about and subsequently elaborates on how to build a model to perform this task using fastai2. The topics covered in this post are as follows

Click on the topic to navigate to the respective section.

Image for post

Photo by Charles Deluvio on Unsplash


In today’s world where data is oil, one way of utilising data is to perform the task of suggestion/recommendation for individuals. In this fast paced world where content is created at an astounding pace, viewers like it when they’re suggested content similar to what they’ve seen before.

In order to do so, the choices, likes, tastes etc. of the users are recorded in the form of ratings or a score which is typically bound in a finite range (most commonly 0–5 or 0–10) where 0 represents that the user strongly disliked the content and 5 or 10 represent that the user found the content very entertaining and to his liking.

Using this data in order to figure out what to next to suggest to a user is what collaborative filtering is all about._ In place of user-anime or user-movie it could be anything like consumer-product or user-news article or subscriber-social media posts and so on._

The more feedback that is obtained from the user, the more relevant the suggestions become because the algorithm gets to understand the tastes of an individual even better.

There are several ways to perform collaborative filtering and today, we’ll be discussing two of them. We’ll be using fastai2 which is a library built by Sylvain Gugger and Jeremy Howard which is an awesome interface built on top of PyTorch for performing deep learning experiments. So, without any further ado, let’s start by understanding the intuition behind CF.

Janae  Haag

Janae Haag


Beer Recommendations using Collaborative Filtering with Neo4j

In this post, I’ll outline how to use a Neo4j graph database to generate user recommendations for a data set consisting of users, products, and user ratings for those products.
For my data set I’m using a database of 30,000 different beers (pulled from brewDB’s open API), and 100 users (I asked facebook friends to rate some beers).

