Exploratory Data Analysis
First, let’s analyze the data. The data is from IBM Watson Studio Platform containing the information about the user and the articles that he has interacted with. The email is encrypted to ensure privacy
Sample rows from the data
Let’s see how many articles does a user reads generally by plotting a histogram. As it can be seen, most of the user read up to 10 articles with a mean value of around 9 articles per user.
Distributions of article interactions per user
The total number of articles on the IBM Platform are 1051 out of which 741 have been interacted with at least once. The total number of users and user-article interactions are 5148 and 45993 respectively.
Finally, a function is used to convert the encrypted email addresses to user ids for easier processing.
#pandas #python #pandas-dataframe