Even the best data does not matter if no one can access it. Therefore, it is crucial to make data accessible to everyone, without the need for them to learn complicated programming languages. This is a problem, that SQL query engines try to solve. They allow everyone with some SQL knowledge to get access to data stored in typical big data systems.

Problem solved? Maybe not. Because in many cases, your data does not live in a nice-to-access data lake like S3. And the computations you want to apply to it are machine learning algorithms or other (typically Python-based) complex operations. That does not play well will the (typically) very strict and often Java-based SQL query engines, such as presto, Apache Impala or Apache Hive.

#machine-learning #sql #python #dask

Deploying a Python SQL Engine to Your Cluster
2.25 GEEK