When searching the keyword “machine learning” on Github, I found 246,632 machine learning repositories. Since these are top repositories in machine learning, I expect the owners and the contributors of these repositories to be experts or competent in machine learning. Thus, I decided to extract the profiles of these users to gain some interesting insights into their background as well as statistics.
To scrape, I use three tools:
I scrape the owners as well as the top 30 contributors of the top 90 repositories that pop up in the search
By removing duplicates as well as removing the profiles that are organizations like udacity, I obtain a list of 1208 users. For each user, I scrape the 20 data points as listed below
new_profile.info()
Specifically, the first 13 data points are obtained from here
The rest of the data points are obtained from the repositories of a user:
total_stars
is the total number of stars of all repositoriesmax_star
is the maximum star among all repositoriesforks
is the total number of forks of all repositoriesdescriptions
are the descriptions from all repository of a user of all repositoriescontribution
is the number of contribution within last year#data-analysis #github #scraping #data-science #data-visualization #data analysis