Lots of people talk about “democratizing” data science and machine learning. What could be more democratic — in the sense of widely accessible — than SQL, PyData, and scaling data science to larger datasets and models?
Dask is rapidly becoming a go-to technology for scalable computing. Despite a strong and flexible dataframe API, Dask has historically not supported SQL for querying most raw data.
In this post, we look at dask-sql, an exciting new open-source library that offers a SQL front-end to Dask. Follow along with this notebook. You can also load it up on Coiled Cloud if you want to access some serious Dask clusters for free with a single click! To do so, log into Coiled Cloud here, navigate to our example notebooks, and launch the dask-sql notebook.
In this post, we:
Many thanks to Nils Braun, the creator of dask-sql, for his thoughtful and constructive feedback on this post.
#dask #sql