Modern Data Science Skills: 8 Categories, Core Skills, and Hot Skills

Modern Data Science Skills: 8 Categories, Core Skills, and Hot Skills

We analyze the results of the Data Science Skills poll, including 8 categories of skills, 13 core skills that over 50% of respondents have, the emerging/hot skills that data scientists want to learn, and what is the top skill that Data Scientists want to learn.


The latest KDnuggets Poll was a follow-up on last year’s very popular poll on
Data Science Skills, and asked the same two questions:

1. Which skills / knowledge areas do you currently have (at the level you can use in work or research)? and

2. Which skills do you want to add or improve?

The classical Data Science Venn Diagram which Drew Conway proposed in 2013 has 3 main areas: Hacking (Programming), Math & Statistics, and Business/Domain Knowledge.
However, the Data Science field has been evolving at such speed these 3 areas are no longer sufficient. Now Data Science includes additional areas, such as Deep Learning algorithms and Cloud Computing Platforms. More Math knowledge (especially Algebra and Calculus) is needed for Deep Learning. The COVID pandemic added the demand for Survival Analysis and Epidemiology. Deploying Data Science requires understanding of software development, DevOps, and using GitHub, Docker, and similar tools.

We reviewed many blogs and articles on Data Science skills, and updated and expanded the list of skills/knowledge areas from 30 items last year to 50 in this poll. To better organize this list, we divided it into 8 categories, adding 5 more to ones in Conway Venn diagram:

  • Programming Languages : Python, R, Java, Java, C++, MATLAB, SAS, Scala, Julia
  • Math & Stats: Algebra & Calculus, Probability & Stats, Survival Analysis, Epidemiology
  • Business & Communication: Business Understanding, Critical Thinking, Communications Skills, Excel, Data Visualization, Tableau, PowerBI
  • Data Science / ML Tools/Methods: Data Cleaning / Prep, ML Algorithms, Scikit-learn, Text Processing, XGBoost, Unstructured Data, Kaggle, Reinforcement Learning
  • Software Development: Github, Software Engineering, Docker, DevOps, Kubernetes
  • SQL / Databases: SQL/Database Coding, No-SQL Databases, Graph Databases
  • Big Data / Cloud: AWS, Apache Spark, Dask, Microsoft Azure, Google Cloud, Hadoop, Other Big Data Tools, Other Cloud Computing Platforms
  • Deep Learning: DL algorithms, Keras, NLP, TensorFlow, Computer Vision, PyTorch, Other DL frameworks

The above list and categorization are not complete or perfect, but they are a useful way to understand the current state of skills of Data Scientists, as the poll results show.

This poll received nearly 1000 votes.
An average respondent had 16 skills (vs 10 in 2019) and wanted to add or improve 18 skills (vs 6.5 in 2019).

Fig. 1 below shows a radar chart of skills by categories, with blue line indicating skills respondents have and orange line indicating skills wanted. Since there are many entries in each category, we used the maximum percentage (the most popular entry) to represent that category.

Skill 2020 Radar 8cat

Fig. 1: 8 Categories of Modern Data Science-related Skills, Have vs Want

We note that a typical Data Scientist does well on the first 6 of those categories: Programming, Math & Stats, Business & Comm, DS/ML Tools, SW Development, and SQL/Databases, with % Have ranging from 79% to 69%.
Only 54% have Software development skills. Two most modern areas show gaps in the current skills, with % wanting the skill exceeding % having the skill – Big Data & Cloud and Deep Learning.

Table 1: % Have vs % Want by Category

Category Max %Have Max %Want
Programming Languages 79% 43%
Math & Stats 73% 39%
Business & Communication 72% 38%
Data Science & ML Tools/Methods 70% 52%
Software Development 54% 45%
SQL/Databases 69% 43%
Big Data & Cloud 20% 49%
Deep Learning 34% 51%

Modern Data Science is not done by unicorns that have all needed skills, but by teams of people, and it would be useful to examine the skills of different job profiles – Researcher, Data Scientist, Data Analyst, Machine Learning Engineer, Business Analyst, etc. We leave this until a future time.

Next, we examine the popularity of individual Data Science skills in this poll.

Data Science Skills Want vs Have

Fig. 2: Modern Data Science-related Skills, Have vs Want

X-axis shows % Have Skill – answers to the first poll question, and Y-axis showing % Want Skill – answers to the 2nd poll question.

Shape represents the category – see below. Shape size is proportional to % of voters that have that skill. The color depends on the ratio of Want/Have: red is high – more than 1.2, grey is between 1.2 and 0.8, and blue is low – less than 0.8).

Skill Category Shape

As in last year Data Science skills poll, we can see two main clusters.

Cluster 1, in blue dashed rectangle on the right side of the chart which includes all the skills that over 50% of all respondents have.
The color of all shapes in this cluster is blue, indicating that Want/Have is less than 0.8. As last year, we call this set Core Data Science Skills. They are listed in Table 2.

Table 2: 13 Core Data Science Skills, in decreasing order of %Have

Skill Category %Have %Want %Want/
%Have
Python Programming Lang 78.8% 43.1% 0.55
Probability & Statistics Math & Stats 73.4% 38.7% 0.53
Data Visualization Business & Communication 71.6% 37.7% 0.53
Math (Algebra & Calculus) Math & Stats 70.7% 28.6% 0.40
Critical Thinking Business & Communication 70.3% 28.8% 0.41
Data Cleaning / Data Preparation / ETL Data Science & ML Tools 70.1% 31.9% 0.45
Communications Skills Business & Communication 69.4% 33.4% 0.48
Excel Business & Communication 69.4% 15.0% 0.22
SQL SQL/Databases 69.2% 29.1% 0.42
Machine Learning Techniques Data Science & ML Tools 61.9% 42.2% 0.68
Business Understanding Business & Communication 60.9% 34.9% 0.57
Github Software Development 54.2% 41.1% 0.76
Scikit-learn Data Science & ML Tools 52.3% 37.6% 0.72

The core skills are almost the same as in 2019 poll, with two exceptions. R declined in popularity from 45% to 40% this year and was not included in core skills. One new skill was added: Github (not in 2019 poll).

The most common categories among core skills are Business & Communication (5) and Data Science & ML Tools (3).

The poll also allowed people to select both “have” and “want to add or improve” the skill (which explains why for some skills %Have + %Want > 100%). Among the core skills, the ones people most want to improve are

  • Python, 33% of those that have it want to improve it
  • Machine Learning Techniques, 33%
  • Probability & Statistics, 31%
  • Data Visualization, 30%
  • Scikit-learn, 29%

The skills with the lowest desire to improve them are

  • Excel, 11%
  • SQL, 18%

The second cluster, on the left in Fig. 2, marked with a red border, includes skills that fewer people currently have (%Have 30%) but more people want to add them, with %Want/%Have > 1.2, and with at least 15% of respondents wanting them.

We call them Hot / Emerging Data Science Skills, and they are listed in Table 3. We see that the hottest skills, with the highest percentage that want to learn them, are Reinforcement Learning, TensorFlow, Deep Learning Algorithms, and PyTorch.

Table 3: Hot / Emerging Modern Data Science Skills, in decreasing order of %Want

Skill Category %Want %Have %Want/
%Have
Reinforcement Learning Data Science & ML Tools 51.9% 13.8% 3.8
TensorFlow Deep Learning 51.2% 26.0% 2.0
Deep Learning Algorithms Deep Learning 50.8% 34.0% 1.5
PyTorch Deep Learning 50.1% 12.5% 4.0
AWS (Amazon Web Services) Big Data & Cloud 48.8% 20.1% 2.4
NLP Deep Learning 48.7% 27.3% 1.8
Apache Spark Big Data & Cloud 45.3% 17.8% 2.5
Docker Software Development 44.9% 17.0% 2.6
No-SQL Databases SQL/Databases 43.0% 25.5% 1.7
Computer Vision Deep Learning 42.7% 20.7% 2.1
Kubernetes Software Development 41.3% 5.8% 7.2
Keras Deep Learning 41.1% 28.2% 1.5
Unstructured Data Data Science & ML Tools 40.8% 29.4% 1.4
Graph Databases SQL/Databases 39.4% 14.2% 2.8
Survival Analysis Math & Stats 37.7% 19.8% 1.9
Google Cloud Computing Big Data & Cloud 37.4% 14.7% 2.5
Microsoft Azure Big Data & Cloud 37.3% 15.3% 2.4
DevOps Software Development 36.2% 14.9% 2.4
Kaggle Data Science & ML Tools 36.0% 25.9% 1.4
PowerBI Business & Communication 33.6% 25.1% 1.3
Big Data Tools other than Hadoop or Spark Big Data & Cloud 32.6% 9.5% 3.4
Hadoop Big Data & Cloud 32.5% 13.1% 2.5
Other DL frameworks Deep Learning 30.0% 6.0% 5.0
Julia Programming Lang 29.1% 2.0% 14.9
Scala Programming Lang 28.4% 5.9% 4.8
Epidemiology Math & Stats 27.4% 8.2% 3.3
Dask Big Data & Cloud 21.2% 3.4% 6.2
Other Cloud Computing Platforms Big Data & Cloud 18.4% 4.4% 4.2
SAS Programming Lang 17.6% 11.6% 1.5

The most common categories among emerging Data Science skills are:

  • Big Data & Cloud, 8
  • Deep Learning, 7
  • Data Science & ML Tools 3
  • Programming Lang, 3
  • Software Development, 3

The remaining skills are those where the demand is not growing strongly Want/Have is 1.2 and the current popularity is less than 50%.
They can still be very useful for many areas. This group is shown in Table 4.

Table 4: Useful / Other Data Science Skills, in decreasing order of %Have

Skill Category %Have %Want %Want/
%Have
R Language Programming Lang 40.6% 34.8% 0.86
Text Processing Data Science & ML Tools 37.5% 39.9% 1.1
Software Engineering Software Development 33.9% 31.8% 0.94
Tableau Business & Communication 31.8% 35.5% 1.1
XGBoost Data Science & ML Tools 29.5% 34.6% 1.2
Java Programming Lang 22.2% 22.3% 1.0
C++ Programming Lang 21.0% 24.9% 1.2
MATLAB Programming Lang 18.8% 16.1% 0.86

Let us know what we missed and what you think – comment below!

Related:

Author: admin

Leave a Reply

Your email address will not be published.