At smaller companies access to and control of data is one of the biggest challenges faced by data analysts and data scientists. The same is true at larger companies when an analytics team is forced to navigate bureaucracy, cybersecurity and over-taxed IT, rather than benefit from a team of data engineers dedicated to collecting and making good data available.

Creative, persistent analysts find ways to get access to at least some of this data. Through a combination of daily processes to save email attachments, run database queries, and copy and paste from internal web pages one might build up a mighty collection of data sets on a personal computer or in a team shared drive or even a database.

But this solution does not scale well, and is rarely documented and understood by others who could take it over if a particular analyst moves on to a different role or company. In addition, it is a nightmare to maintain. One may spend a significant part of each day executing these processes and troubleshooting failures; there may be little time to actually use this data!

I lived this for years at different companies. We found ways to be effective but data management took up way too much of our time and energy. Often, we did not have the data we needed to answer a question. I continued to learn from the ingenuity of others and my own trial and error, which led me to the theoretical framework that I will present in this blog series: building a self-managed data library.

A data library is _not _a data warehousedata lake, or any other formal BI architecture. It does not require any particular technology or skill set (coding will not be required but it will greatly increase the speed at which you can build and the degree of automation possible). So what is a data library and how can a small data analytics team use it to overcome the challenges I’ve described?

#big data #cloud & devops #data libraries #small data science teams #introduction to data libraries for small data science teams #data science

Introduction to Data Libraries for Small Data Science Teams
1.15 GEEK