Hal  Sauer

Hal Sauer

1593546120

Variational Autoencoders for Collaborative Filtering with Mxnet and Pytorch

This post and the code here are part of a larger repo called RecoTour, where I normally explore and implement some recommendation algorithms that I consider interesting and/or useful (see RecoTour and RecoTourII). In every directory, I have included a README file and a series of explanatory notebooks that I hope help explaining the code. I keep adding algorithms from time to time, so stay tuned if you are interested.

As always, let me first acknowledge the relevant people that did the hard work. This post and the companion repo are based on the papers “Variational Autoencoders for Collaborative Filtering” [1] and “Auto-Encoding Variational Bayes” [2]. The code here and in that repo is partially inspired by the implementation from Younggyo Seo. I have adapted the code to my coding preferences and added a number of options and flexibility to run multiple experiment.

The reason to take a deep dive into variational auto-encoders for collaborative filtering is because they seem to be one of the few Deep Learning based algorithms (if not the only one) that obtains better results that those using non-Deep Learning techniques [3].

Throughout this exercise I will use two dataset. The Amazon Movies and TV dataset [4] [5] and the Movilens dataset. The later is used so I can make sure I am obtaining consistent results to those obtained in the paper. The Amazon dataset is significantly more challenging that the Movielens dataset as it is ∼13 times more sparse.

All the experiments in this post were run using a p2.xlarge EC2 instance on AWS.

The more detailed, original version of this post in published in my blog. This intends to be a summary of the content there and focuses more on the implementation/code and the corresponding results and less on the math.

1. Partially Regularized Multinomial Variational Autoencoder: the Loss function

I will assume in this section that the reader has some experience with Variational Autoencoders (VAEs). If this is not the case, I recommend reading Kingma and Welling’s paperLiang et al paper, or the original post. There, the reader will find a detailed derivation of the Loss function we will be using when implementing the Partially Regularised Multinomial Variational Autoencoder (Mult-VAE). Here I will only include the final expression and briefly introduce some additional pieces of information that I consider useful to understand the Mult-VAE implementation and the loss below in Eq (1).

Let me first describe the notational convention. Following Liang et al., 2018, I will use u ∈ {1,…,U} to index users and i ∈ {1,…,I} to index items. The user-by-item binary interaction matrix (i.e. the click matrix) is X ∈ ℕ^{U × I} and I will use lower case xᵤ =[X{u1},…,X{uI}] ∈ ℕ^I to refer to the click history of an individual user u.

With that notation, the Mult-VAE Loss function is defined as:

where M is the mini-batch size. The first element within the summation is simply the log-likelihood of the click history _xᵤ **conditioned to the latent representation **z_ᵤ, i.e. log((xᵤ|z_ᵤ_)) (see below). The second element is the Kullback–Leibler divergence for VAEs when both the encoder and decoder distributions are Gaussians (see here).

We just need a bit more detail before we can jump to the code. xᵤ,the click history of user u, is defined as:

where**_ Nᵤ = ᵢ Nᵤᵢ **is the total number of clicks for user uAs I mentioned before**z_ᵤ**is latent representation of xᵤ,and is assumed to be drawn from a standard Gaussian prior (**z_ᵤ**)∼ N(0, I). During the implementation of the Mult-VAE, **z_ᵤ **needs to be sampled from an approximate posterior (**z_ᵤx_ᵤ**​) (which is also assume to be Gaussian). Since computing gradients when sampling is involved is…“complex”, Kingma and Welling introduced the so-called reparameterization trick (please, read the original paperoriginal post, or any of the multiple online resources.) for more details on the reparameterization trick), so that the sampled **z_ᵤ _**will be computed as:

μ and σ in Eq 3 are functions of neural networks and ϵ ∼ N(0, I) is Gaussian noise. Their computation will become clearer later in the post when we see the corresponding code. Finally, π(z_ᵤ_) in Eq (2) is π(z_ᵤ_)= Softmax(z_ᵤ_).

At this stage we have almost all the information we need to implement the Mult-VAE and its loss function in Eq (1): we know what **_xᵤ _**is, **z_ᵤ, μ and σ _**will be functions of our neural networks, and π is just the Softmax function. The only “letter” left to discuss from Eq (1) is β.

Looking at the loss function in Eq (1) within the context of VAEs, we can see that the first term is the “reconstruction loss”, while the KL divergence​ act as a regularizer. With that in mind, Liang et al add a factor β to control the strength of the regularization, and propose β<1. For a more in-depth refection of the role of _β, and in general a better explanation of the form of the loss function for the Mult-VAE, _please read the original paper or the original post.

Without further ado, let’s move to the code:

#pytorch #mxnet #variational-autoencoder #recommendation-system #python

What is GEEK

Buddha Community

Variational Autoencoders for Collaborative Filtering with Mxnet and Pytorch
Hal  Sauer

Hal Sauer

1593546120

Variational Autoencoders for Collaborative Filtering with Mxnet and Pytorch

This post and the code here are part of a larger repo called RecoTour, where I normally explore and implement some recommendation algorithms that I consider interesting and/or useful (see RecoTour and RecoTourII). In every directory, I have included a README file and a series of explanatory notebooks that I hope help explaining the code. I keep adding algorithms from time to time, so stay tuned if you are interested.

As always, let me first acknowledge the relevant people that did the hard work. This post and the companion repo are based on the papers “Variational Autoencoders for Collaborative Filtering” [1] and “Auto-Encoding Variational Bayes” [2]. The code here and in that repo is partially inspired by the implementation from Younggyo Seo. I have adapted the code to my coding preferences and added a number of options and flexibility to run multiple experiment.

The reason to take a deep dive into variational auto-encoders for collaborative filtering is because they seem to be one of the few Deep Learning based algorithms (if not the only one) that obtains better results that those using non-Deep Learning techniques [3].

Throughout this exercise I will use two dataset. The Amazon Movies and TV dataset [4] [5] and the Movilens dataset. The later is used so I can make sure I am obtaining consistent results to those obtained in the paper. The Amazon dataset is significantly more challenging that the Movielens dataset as it is ∼13 times more sparse.

All the experiments in this post were run using a p2.xlarge EC2 instance on AWS.

The more detailed, original version of this post in published in my blog. This intends to be a summary of the content there and focuses more on the implementation/code and the corresponding results and less on the math.

1. Partially Regularized Multinomial Variational Autoencoder: the Loss function

I will assume in this section that the reader has some experience with Variational Autoencoders (VAEs). If this is not the case, I recommend reading Kingma and Welling’s paperLiang et al paper, or the original post. There, the reader will find a detailed derivation of the Loss function we will be using when implementing the Partially Regularised Multinomial Variational Autoencoder (Mult-VAE). Here I will only include the final expression and briefly introduce some additional pieces of information that I consider useful to understand the Mult-VAE implementation and the loss below in Eq (1).

Let me first describe the notational convention. Following Liang et al., 2018, I will use u ∈ {1,…,U} to index users and i ∈ {1,…,I} to index items. The user-by-item binary interaction matrix (i.e. the click matrix) is X ∈ ℕ^{U × I} and I will use lower case xᵤ =[X{u1},…,X{uI}] ∈ ℕ^I to refer to the click history of an individual user u.

With that notation, the Mult-VAE Loss function is defined as:

where M is the mini-batch size. The first element within the summation is simply the log-likelihood of the click history _xᵤ **conditioned to the latent representation **z_ᵤ, i.e. log((xᵤ|z_ᵤ_)) (see below). The second element is the Kullback–Leibler divergence for VAEs when both the encoder and decoder distributions are Gaussians (see here).

We just need a bit more detail before we can jump to the code. xᵤ,the click history of user u, is defined as:

where**_ Nᵤ = ᵢ Nᵤᵢ **is the total number of clicks for user uAs I mentioned before**z_ᵤ**is latent representation of xᵤ,and is assumed to be drawn from a standard Gaussian prior (**z_ᵤ**)∼ N(0, I). During the implementation of the Mult-VAE, **z_ᵤ **needs to be sampled from an approximate posterior (**z_ᵤx_ᵤ**​) (which is also assume to be Gaussian). Since computing gradients when sampling is involved is…“complex”, Kingma and Welling introduced the so-called reparameterization trick (please, read the original paperoriginal post, or any of the multiple online resources.) for more details on the reparameterization trick), so that the sampled **z_ᵤ _**will be computed as:

μ and σ in Eq 3 are functions of neural networks and ϵ ∼ N(0, I) is Gaussian noise. Their computation will become clearer later in the post when we see the corresponding code. Finally, π(z_ᵤ_) in Eq (2) is π(z_ᵤ_)= Softmax(z_ᵤ_).

At this stage we have almost all the information we need to implement the Mult-VAE and its loss function in Eq (1): we know what **_xᵤ _**is, **z_ᵤ, μ and σ _**will be functions of our neural networks, and π is just the Softmax function. The only “letter” left to discuss from Eq (1) is β.

Looking at the loss function in Eq (1) within the context of VAEs, we can see that the first term is the “reconstruction loss”, while the KL divergence​ act as a regularizer. With that in mind, Liang et al add a factor β to control the strength of the regularization, and propose β<1. For a more in-depth refection of the role of _β, and in general a better explanation of the form of the loss function for the Mult-VAE, _please read the original paper or the original post.

Without further ado, let’s move to the code:

#pytorch #mxnet #variational-autoencoder #recommendation-system #python

Collaborative Filtering in Pytorch

Collaborative filtering is a tool that companies are increasingly using. Netflix uses it to recommend shows for you to watch. Facebook uses it to recommend who you should be friends with. Spotify uses it to recommend playlists and songs. It’s incredibly useful in recommending products to customers.

In this post, I construct a collaborative filtering neural network with embeddings to understand how users would feel towards certain movies. From this, we can recommend movies for them to watch.

The dataset is taken from here. This code is loosely based off the fastai notebook.

First, let get rid of the annoyingly complex user ids. We can make do with plain old integers. They’re much easier to handle.

import pandas as pd
ratings = pd.read_csv('ratings.csv')
movies = pd.read_csv('movies.csv')

Then we’ll do the same thing for movie ids as well.

u_uniq = ratings.userId.unique()
user2idx = {o:i for i,o in enumerate(u_uniq)}
ratings.userId = ratings.userId.apply(lambda x: user2idx[x])

We’ll need to get the number of users and the number of movies.

n_users=int(ratings.userId.nunique())
n_movies=int(ratings.movieId.nunique())

First, let’s create some random weights. We need to call. This allows us to avoid calling the base class explicitly. This makes the code more maintainable.

These weights will be uniformly distributed between 0 and 0.05. The _ operator at the end of uniform_ denotes an inplace operation.

class EmbeddingDot(nn.Module):
		def __init__(self, n_users, n_movies):
			super().__init__()
			self.u = nn.Embedding(n_users, n_factors)
			self.m = nn.Embedding(n_movies, n_factors)
			self.u.weight.data.uniform_(0,0.05)
			self.m.weight.data.uniform_(0,0.05)
view raw
embedding_matrices.py hosted with ❤ by GitHub

Next, we add our Embedding matrices and latent factors.

We’re creating an embedding matrix for our user ids and our movie ids. An embedding is basically an array lookup. When we multiply our one-hot encoded user ids by our weights most calculations cancel to 0 (0 * number = 0). All we’re left with is a particular row in the weight matrix. That’s basically just an array lookup.

So we don’t need the matrix multiply and we don’t need the one-hot encoded array. Instead, we can just do an array lookup. This reduces memory usage and speeds up the neural network. It also reveals the intrinsic properties of the categorical variables. This idea was applied in a recent Kaggle competition and achieved 3rd place.

The size of these embedding matrices will be determined by n_factors. These factors determine the number of latent factors in our dataset.

#machine-learning #collaborative-filtering #deep-learning #pytorch #deep learning

Ray  Patel

Ray Patel

1623145380

Item-Based Collaborative Filtering in Python

The practice of making the item-based collaborative filtering in python.

Item-based collaborative filtering  is the recommendation system to use the similarity between items using the ratings by users. In this article, I explain its basic concept and practice how to make the item-based collaborative filtering using Python.

Basic Concept

Making a Movie Recommender

#item-based-cf #python #collaborative-filtering #movie-recommendation #item-based collaborative filtering #recommender

Percy  Ebert

Percy Ebert

1596619131

How to Choose the Team Collaboration Software

Teams spread over remote locations as well as the office are increasingly in vogue. As the coronavirus pandemic spreads in 2020, it has also resulted in increased reliance on distributed teams. These teams provide a huge amount of benefits and also pose a set of unique problems. Businesses are not surprisingly continually updating how they manage distributed teams. They are also increasingly using team collaboration software to overcome the challenges that distributed teams pose. So continue reading, to find the latest information on software that helps in the coordination and managing of distributed teams.

Overview of Distributed Team Management

1. Communication is the key

The cornerstone of managing a distributed team is communication. In a distributed team relying on email alone is just not an option. So there is an ever-present need to adopt the best team communication software tools available and provide feedback routinely.

2. Management of productivity

A distributed team’s productivity can be high at certain times and then fall off. To ensure that team productivity remains at optimum levels managers need to be able to monitor it. Managers also need the best team messaging collaboration solution to manage the productivity of individuals and the team.

3. Solid tech infrastructure

Cutting edge tech infrastructure is the backbone of a business communication solution. Using this infrastructure, managers can efficiently monitor employees and their work. Additionally, employees can use the tech infrastructure to ensure work is completed on time.

4. Advanced security features

Working in distributed teams often means having to deal with security issues that crop up when people work from home. Also within such teams steps have to be taken to ensure that data and customer information remains secure when transmitted.

5. Elevating team spirit and morale

Within an office, teams interact easily and managers can keep an eye on morale and team spirit. In distributed teams, the process of keeping team spirit and morale bubbling with energy is much more complex. The process requires the team and manager to put in extra effort and rely on the aid of group collaboration software.

#team-collaboration #collaboration #online-collaboration-tools #collaboration-tools #api

Dicanio Rol

Dicanio Rol

1594449179

Complete Guide to build an AutoEncoder in Pytorch and Keras

This article is continuation of my previous article which is complete guide to build CNN using pytorch and keras.

Taking input from standard datasets or custom datasets is already mentioned in complete guide to CNN using pytorch and keras. So we can start with necessary introduction to AutoEncoders and then implement one.

AutoEncoders

Auto Encoder is a neural network that learns encoding data with minimal loss of information.

There are many variants of above network. Some of them are:

Sparse AutoEncoder

This auto-encoder reduces overfitting by regularizing activation function hidden nodes.

Denoising AutoEncoder

This auto-encoder is trained by adding noise to input. This will remove noise from input at evaluation.

#keras #variational-autoencoder #pytorch