Instead of focusing on where the data lives, focus on making the analytics experience as smooth as possible for everyone in your organization.

Right now, the world of analytical data engineering and data architecture is awash with confusion and controversy about how we should handle data for analytics. A lot of the controversy centers on which is better, a data warehouse architecture, or a data lake architecture, or some combination of the two.

I think we’re all asking the wrong question.

Analytics end users don’t care where or how data is stored.

Executives, business analysts, data scientists, even line-of-business workers – they’re interested in the analytics, but not as much in the data.

As data engineers and architects, we have to care where the data is stored, how it gets there, how it gets clean, and managed to feed into analytics. We have to worry about real-time pipelines, historical storage, and combining incoming time-series sensor data with geographic data about the weather at that timestamp.

However, the people driving the business, the people who write our paychecks, don’t care. As a profession, we need to stop forcing them to worry about where their data resides.

Unify the analytics, not the data

Once upon a time, data warehouse architectures were designed to gather data, combine it, polish it, and present it to visualization tools that showed everyone how the business was performing. Business analysts put in SQL queries as needed.

Then, along came Doug Laney’s three V’s –massive increases in data volume, velocity, and variety, including streaming real-time data from devices that, in many ways, encompassed all three. Also, new people called data scientists, who looked a lot like our old quants, statisticians, and actuaries, needed all that data to do sophisticated predictive analytics and machine learning.

The data lake was touted as the solution. Dump everything here and do analytics on top of that crazy mess. It’ll be great.

But it wasn’t so great.

Governance wasn’t there; security wasn’t there. Most importantly to end users, concurrency and response times weren’t there. The architecture could no longer support all the people who wanted to perform analytics, much less expand to allow more people in the company to use data to drive their decisions. Nor could it provide analytical answers at the speed they wanted to ask the questions, much less the speed of automation.

#analytics #big data platforms #from our experts #data management #big data

Deliver Analytics Like Amazon Delivers Packages
1.15 GEEK