If you work as a data analyst, the probability that you’ve came across a dataset that caused you a lot of trouble due to it’s size or complexity is high. Most data analysts today rely on a combination of several visualization and spreadsheet tools that help them make sense of the data around them, but the “curse of scattered files” still stands —particularly in big companies.

But, as we leave behind the first two decades of the millenium, we witness a huge growth in the creation of new data sources — not only data analysts need to make sense of data that is produced within the organization (and with organizations trying to be more data-savyy, the amount of data produced and stored grows exponentially) but sometimes they are asked to make sense of external data, extraneous to the company. This diversity asks for new ways to approach new problems that are not solvable with old tools.

#data-pipeline #r-programming-language #r-programming #r

Building Data Pipelines using R
6.35 GEEK