The results show that despite the deluge of Big Data, large majority still works in Gigabyte or Megabyte-size datasets. Data Scientists work with the largest-size datasets, followed by Data Engineers, Data Analysts, and Business Analysts. Read more for details.

Despite the deluge of big data, the results of 2020 Poll on largest dataset analyzed show that most respondents still work in gigabyte range, with essentially the same curve as in previous polls which asked this question starting in 2012. The data also shows a small, but notable segment working with web-scale data of over 100 Petabytes.

Poll Largest Dataset Analyzed 2020, 2018, 2016

Fig. 1: KDnuggets Poll: Largest Dataset Analyzed, 2020, 2018, 2016

2020 data is shown as a column, to stand apart from lines for previous years.

The results are based on 562 participants.

Note that the poll asks about the largest dataset analyzed, so a typical dataset analyzed is expected to be significantly smaller.

Highlights:

  • **Most data people still work in Gigabytes range: **Majority of answers (78% in 2020, 80% in 2018, 83% in 2016) are in Gigabyte or Megabyte range. The overall median response was yet again between 11 and 100 GB (which comfortably fits on one laptop) for each year since 2012.
  • Consistency: the shape of the curve each year is almost the same. We see some changes in 2020 curve with more respondents on the lower end, reflecting perhaps the entrance of many junior people in the field, but the overall shape is still the same.
  • Petabyte Big Data Scientists still stand apart: There is a small but significant gap, with almost no answers in 1-100 PB range, which separates analysts who work with Terabyte-size commercial data warehouses and those who work with web-scale 100+ petabyte data stores.

This poll also asked about employment type, and the breakdown was

  • Company or Self-Employed, 62% (amazingly, this was also 62% in both 2018 and 2016)
  • Student, 20% (was 17% in 2018, 20% in 2016)
  • Academia/University, 8% (was 13% in 2018, 10% in 2016)
  • Government/non-profit, 4.4% ( was 4.8% in 2018, 5.1% in 2016)
  • Unemployed or Other, 5% (was 3.2% in 2018, 2.4% in 2016)

We also asked a new question - what was your main “data role”, and the responses were

Poll Largest Dataset Role

#2020 jul opinions #data analysis

Largest Dataset Analyzed – Poll Results and Trends
1.40 GEEK