As data science enthusiasts know, there’s a lot more to excelling in the field than just its technical aspects. Data professionals need a wide range of skills, extending well beyond the technical aspects of data manipulation and analysis.

This week’s episode of the Alter Everything podcast showcases Carlene Jones, data and analytics consultant, and Nynne Haagensen, a data enthusiast who worked with Carlene. Their conversation reinforces that people skills, communication abilities and business savvy are all critical to success in data science and analytics.

What are all those skills? To explore online conversations around this skill set, I decided to gather and analyze some data, naturally, inspired by this fantastic topic modeling trilogy (part 3 is coming soon!). This seemed like a fun opportunity to apply topic modeling with Alteryx Designer to what folks have discussed out there on the interwebz about the data science skill set. (Topic Modeling is part of the Alteryx Intelligence Suite, which includes some new text mining tools.)

Gathering Opinions

I built a workflow in Designer that scraped 64 articles from the data science site KDnuggets tagged “skills” and cleaned up the text. I also used Text Pre-processing to quickly prep the remaining text before sending it into the Topic Modeling and Word Cloud tools. The word cloud below gives you a preview of some of the prominent ideas, but topic modeling lets us dig a little deeper.

Word cloud of terms related to data science skills

Word cloud of terms related to data science skills

I asked the Topic Modeling tool to identify three dominant topics in the text of these articles. You should definitely read all the details on how this process works, but in a nutshell: This is an unsupervised approach, meaning that I’m not specifying what I want the model to find in advance, but rather letting it identify on its own the key ideas in the text of the articles. This tool assumes that each chunk of text I feed it is a mixture of those three different topics, since I asked for three. It figures out how those topics are represented in each chunk based on the probability that certain words occur together. It doesn’t give a name to the topics it finds, though; it needs us to figure out what its groupings of words mean.

Technical Skills and More

Image for post

Image for post

The topic model that results from this analysis is open to interpretation, but here’s what I see. Topic 1 looks to describe the role of the data analyst or data scientist within an organization, with some technical terms mentioned (Python, SQL, Hadoop). However, it also includes concepts like “value,” “market” and “demand” that could reflect the business expertise a skilled data professional brings to the organization. Some of the chunks of original text that scored highly for the presence of Topic 1 include:

  • “… a data scientist doesn’t just possess technical skills, they also have domain expertise”
  • “Knowing the basic principles of data science and machine learning is still required, but knowing how to apply them to your problem is even more valuable”
  • “Remember, my goal wasn’t to invent a new machine learning algorithm; it was to demonstrate to a client the potential machine learning had or didn’t have for their business”

Image for post

Image for post

Topic 2 has “learning” as its most relevant term and “machine” in second place, so a quick conclusion would be that Topic 2 reflects the prominence of machine learning skills for data science. However, a closer review suggests that maybe “learning” could also be interpreted in another way. Some of the chunks of text that scored highly for Topic 2 include:

  • “Apart from classroom learning, you can practice what you learned in the classroom by building an app, starting a blog, or exploring data analysis to enable you to learn more”
  • “Communication problems are harder than technical problems”
  • “If you’re stuck on a problem, sitting and staring at code may solve it or may not. Instead talk it out in language with a teammate”

Some of the other terms included in this topic are “question,” “understand,” “team,” “approach” and “offer.” This topic seems to have a theme of ongoing learning and skill development for the data professional.

#skills-development #soft-skills #learning-to-code #data-science #careers

Sources Agree: Data Science Skills Go Beyond Data
1.05 GEEK