Today, Spark SQL is one of the most valuable components of Apache Spark. It powers both SQL queries and the DataFrame API. At its core, the Catalyst optimizer, which leverages advanced Scala features to build an extensible and extremely powerful query optimizer.

In this article, we will try to understand the workflow of the Catalyst Optimizer and then dive deep into the new optimizations that **Adaptive Query Execution **enables.

Adaptive Query Execution (AQE) is a new feature available in Apache Spark 3.0 that allows it to_ optimize and adjust query plans based on runtime statistics _collected while the query is running. To understand how it works, let’s first have a look at the optimization stages that the Catalyst Optimizer performs.

Catalyst Optimizer 101

The catalyst optimizer applies optimizations during logical and physical planning stages. It optimizes the query logically then generates a range of physical plans and selects the most efficient one based on a cost model.

#spark #data-science #data #spark-sql #big-data

Apache Spark 3.0 Adaptive Query Execution
1.35 GEEK