In fall of 2018 I analyzed the most in demand skills and technologies for data scientists. That article resonated with folks. It has over 11,000 claps on Medium, was translated into several languages, and was the most popular story on KD Nuggets for November 2018.
A little over a year has passed. Let’s see what’s new.
By the end of this article you’ll know which technologies are becoming more popular with employers and which are becoming less popular.
In my original 2018 article I looked at demand for general skills such as statistics and communication. I also looked at demand for technologies such as Python and R. Software technologies change must faster than demand for general skills, so I include only technologies in this updated analysis.
I searched SimplyHired, Indeed, Monster, and LinkedIn to see which keywords appeared with “Data Scientist” in job listings in the United States. This time I decided to write the code to scrape the job listings instead of searching by hand. This endeavor proved fruitful for SimplyHired, Indeed, and Monster. I was able to use the Requests and Beautiful Soup Python libraries. You can see the Jupyter notebook with the code for the scraping and analysis at my GitHub repo.
Scraping LinkedIn proved far more arduous. Authentication is required to see an exact count of job listings. I decided to use Selenium for headless browsing. In September 2019, a United States Supreme Court case was decided against LinkedIn, allowing LinkedIn’s data to be scraped. Nonetheless, I was unable to access my account after several scraping attempts. This issue might have stemmed from rate limiting. Update: I’m back in now, but concerned I’ll get locked out if I try to scrape it again.
For what it’s worth, Microsoft owns LinkedIn, Randstad Holding owns Monster, and Recruit Holdings owns Indeed and SimplyHired.
LinkedIn’s data might not have provided an apples-to-apples comparison from last year to this year, anyway. This summer I noticed that LinkedIn started having huge fluctuations from week to week for some tech job search terms. I hypothesize that they might have been experimenting with their search results algorithm by using natural language processing to gauge intent. In contrast, relatively similar numbers of job listings for ‘Data Scientist’ appeared for the three other search sites over both years.
For these reasons, I excluded LinkedIn from the analysis for 2019 and 2018 in this article.
For each job search website, I calculated the percentage of total data scientist job listings for that site that each keyword appeared in. I then averaged those percentages across the three sites for each keyword.
I manually investigated new search terms and scraped those that looked promising. No new terms reached an average of five percent of listings in 2019, the cutoff I used for inclusion in the results below.
Let’s see what we found!
There are at least four ways to look at the results for each keyword:
Let’s look at the first three options with bar charts. Then I’ll show a table with the data and discuss the results.
Here’s chart from number 1 above for 2019, showing that Python appears in nearly 75% of listings.
Here’s the chart from number 2 above, showing the gains and losses in terms of the average percentage of listings between 2018 and 2019. AWS show an increase of 5% points. It appeared in an average of 19.4% of listings in 2019 and an average of 14.6% of listings in 2018.
Here’s the chart for number 3 above, showing the percentage change year over year. PyTorch had 108.1% growth compared to the average percentage of listings it appeared in for 2018.
The charts were all made with Plotly. If you want to learn how to use Plotly to make interactive visualizations, check out my guide. If you want to see the interactive charts, check out the HTML file in my GitHub repo. The Juptyer Notebook for scraping, analysis, and visualizations is there, too.
Below is the information in the charts above, only in table format, sorted by the percentage change in the average percentage of listings from 2018 to 2019.
I know these different measures can get confusing, so here’s a guide to what you’re looking at in the chart above.
There were some pretty substantial changes in less than 14 months!
Python is still on top. It’s by far the most frequent keyword. It’s in nearly three out of four listings. Python saw a decent increase from 2018.
SQL is ascendent. It almost passed R for the second highest average score. If trends continue, it will be number two very soon.
The most prominent deep learning frameworks grew in popularity. PyTorch had the largest percentage increase of any keyword. Keras and TensorFlow posted large gains, too. Both Keras and PyTorch moved up four spots in the rankings and TensorFlow moved up three spots. Note that PyTorch was starting from a low average — TensorFlow’s average is still twice as high as PyTorch’s.
Cloud platform skills are becoming more in demand for data scientists. AWS showed up in nearly 20% of listings and Azure showed up in about 10%. Azure jumped four spots in the rankings.
Those are the technologies that are most on the move! 🚀
R had the largest overall average decline. This finding isn’t surprising given the findings from other surveys. Python has pretty clearly overtaken R as the language of choice for data science. Nonetheless, R remains very popular, showing up in about 55% of listings. If you know R, don’t despair, but think about learning Python too, if you want a more in-demand skill.
Many Apache products fell in popularity, including Pig, Hive, Hadoop, and Spark. Pig fell five spots in the rankings, more than any other technology. Spark and Hadoop are still commonly desired skills, but my findings show a trend away from them and toward other big-data technologies.
Proprietary statistical software packages MATLAB and SAS saw dramatic declines. MATLAB dropped four spots in the rankings and SAS dropped from the sixth to eighth most common. Both languages saw large percentage declines compared to their 2018 averages.
There are a lot of technologies on this list. 😀 You certainly don’t need to know them all. The mythical data scientist is called a unicorn for a reason. 😉
I suggest that if you are starting out in data science, you concentrate on the technologies that are in demand and growing.
Focus on learning one.
Technology.
At.
A.
Time.
(That’s very good advice, even though I haven’t always followed it. 😁)
Here’s my recommended learning path, in order:
That’s my general learning path advice. Tailor it to fit your needs or ignore it and do what you want!
I hope you found this guide to the most in demand technologies for data scientists useful. If you did, please share it on your favorite social media so other folks can find it, too. 👍
Happy Learning!
#data-science #machine-learning