Deciphering Python: How to use Abstract Syntax Trees (AST) to understand code

Let’s get a little “meta” about programming.

How does the Python program (better know as the interpreter) “know” how to run your code? If you’re new to programming, it may seem like magic. In fact, it still seems like magic to me after being a professional for more than a decade.

The Python interpreter is not magic (sorry to disappoint you). It follows a predictable set of steps to translate your code into instructions that a machine can run.

At a fairly high level, here’s what happens to your code:

The code is parsed (i.e., split up) into a list of pieces usually called tokens. These tokens are based on a set of rules for things that should be treated differently. For instance, the keyword if is a different token than a numeric value like 42.
The raw list of tokens is transformed to build an Abstract Syntax Tree, AST, which is the subject we will explore more in this post. An AST is a collection of nodes which are linked together based on the grammar of the Python language. Don’t worry if that made no sense now since we’ll shine more light on it momentarily.
From an abstract syntax tree, the interpreter can produce a lower level form of instructions called bytecode. These instructions are things like BINARY_ADD and are meant to be very generic so that a computer can run them.
With the bytecode instructions available, the interpreter can finally run your code. The bytecode is used to call functions in your operating system which will ultimately interact with a CPU and memory to run the program.

Many more details could fit into that description, but that’s the rough sketch of how typed characters are executed by computer CPUs.

ASTs as analysis tools

By the time your source code is turned into bytecode, it’s too late to gain much understanding about what you wrote. Bytecode is very primitive and very tuned to making the interpreter fast. In other words, bytecode is designed for computers over people.

On the other hand, abstract syntax trees have enough structured information within them to make them useful for learning about your code. ASTs still aren’t very people friendly, but they are more sensible than the bytecode representation.

Because Python is a “batteries included” language, the tools you need to use ASTs are built into the standard library.

The primary tool to work with ASTs is the ast module. Let’s look at an example to see how this works.

#python #abstract syntax trees #understand code

ASTs as analysis tools

mattlayman.com

Deciphering Python: How to use Abstract Syntax Trees (AST) to understand code