The Downside of Functional Data Structure

Functional programming creates a lot of benefits in creating a distributed system. You will deal with fewer bugs in a concurrent environment because everything is immutable. It also helps us local reasoning and composition — we can construct applications and programs like lego building blocks that will produce a predictable outcome. With having a predictable outcome, we can understand code to scale with the size of the codebase.

However, there is also a downside if you strictly make everything “pure” and immutable. For instance, in my previous articles, having a functional data structure, such as List, can be costly. Let’s take the below as an example:

List(1,2,3,4).map(_ + 1).filter(_ % 2 == 0).map(_ * 2)

Each of these operations above will make its input and create a new intermediate List on the output. The expression above for the map function will introduce a new List(2,3,4,5). Then, it goes to filter function and filters out all the odd numbers, which produces a new List, List(2,4) that gets a pass to the last map operation. Each transformation gets to produce a shortlist that gets used as an input to the next transformation and is then immediately discarded. If we manually produce a trace for its output evaluation, it will be something like this:

List(1,2,3,4).map(_ + 1).filter(_ % 2 == 0).map(_ * 2)

List(2,3,4,5).filter(_ % 2 == 0).map(_ * 2)

List(2,4).map(_ * 2)

List(4, 8)

You must be thinking right now — the syntax is compositional and succinct, but the performance and memory usage are terrible. Ideally, we will want to fuse the sequence transformation on each element like this in one go. We don’t want to wait for all the List to finished transforming the first element to have its second transformation. We also don’t want to create a temporary list element in the middle of each change.

How do we do that?

We can change the structure to a while loop and for a loop. However, we want to persist in the compositional structure of the transformation rather than writing monolithic circuits. One way of doing this is to make your data structure non-strict (lazy).

What is non-strict (lazy)?

Before we talk about what is a non-strict, I want to explain what is strict.

The formal definition of strictness is — “If an expression’s evaluation runs forever or throws an error instead of returning a definite value, we say that the expression doesn’t terminate, or that it evaluates to bottom. A function f is strict if the expression f(x) evaluates to bottom for all x that value to bottom."

#scala #programming #data-structures #functional-programming #stream #data analysis

What is non-strict (lazy)?

medium.com

The Downside of Functional Data Structure