The need for data engineers and analysts to run interactive, ad hoc analytics on large amounts of data continues to grow exponentially. Data platform teams are increasingly using PrestoDB, a federated SQL query engine, to run such analytics across a wide range of data lakes and databases, in-place, without the need to move data.
In this post, we will explore the following:
As enterprises rely on more AWS services as well as purpose-built databases installed in the cloud, data and metadata are spread very widely. Platform teams have resorted to heavy data pipelining, moving data around across multiple platforms, and in some cases, even creating multiple copies of the data to make the data more accessible. In addition, self-service analytics requires platform engineers to integrate many business intelligence, reporting tools, data science notebooks, and data visualization tools across every data source used.
The obvious downsides are added latency—consumers need to wait longer for those data pipelines to complete and for their tools to be connected–and added costs, since duplicate data consumes additional storage and data movement burns compute cycles. All those platform engineering, management, and monitoring tasks add up. Given the complexity of these activities, platform teams are looking to simplify their approach, and we often see the following requirements from users:
#sql #aws #kubernetes #data analytics #aws cloud #presto #prestodb