Exploratory Data Analysis

First, let’s analyze the data. The data is from IBM Watson Studio Platform containing the information about the user and the articles that he has interacted with. The email is encrypted to ensure privacy

Some of the rows of the data when loaded in pandas DataFrame

Sample rows from the data

Let’s see how many articles does a user reads generally by plotting a histogram. As it can be seen, most of the user read up to 10 articles with a mean value of around 9 articles per user.

Distributions of article interactions per user

The total number of articles on the IBM Platform are 1051 out of which 741 have been interacted with at least once. The total number of users and user-article interactions are 5148 and 45993 respectively.

Finally, a function is used to convert the encrypted email addresses to user ids for easier processing.

#pandas #python #pandas-dataframe

Pandas DataFrame Group by Consecutive Certain Values
3.50 GEEK