Introduction – 00:00
Example – 02:25
What about synchronous? – 03:35
Alternatives – 05:03
Worker threads – 09:27
Main thread – 10:12

Since the release of Node.js v10.5.0, there’s a new worker_threads module available, and it has been stable since Node.js v12 LTS.

What exactly is this Worker thread module, and why do we need it? In this post, we will talk about the historical reasons concurrency is implemented in JavaScript and Node.js, the problems we might find, current solutions, and the future of parallel processing with Worker threads.

Living in a single-threaded world

JavaScript was conceived as a single-threaded programming language that ran in a browser. Being single-threaded means that only one set of instructions is executed at any time in the same process (the browser, in this case, or just the current tab in modern browsers).

This made things easier for implementation and for developers using the language. JavaScript was initially a language only useful for adding some interaction to webpages, form validations, and so on — nothing that required the complexity of multithreading.

Ryan Dahl, the creator of Node.js, saw this limitation as an opportunity. He wanted to implement a server-side platform based on asynchronous I/O, which means you don’t need threads (which makes things a lot easier). Concurrency can be a very hard problem to solve. Having many threads accessing the same memory can produce race conditions that are very hard to reproduce and fix.

Is Node.js single-threaded?

So, our Node.js applications are single-threaded, right? Well, kind of.

Actually, we can run things in parallel, but we don’t create threads, and we don’t sync them. The virtual machine and the operating system run the I/O in parallel for us, and when it’s time to send data back to our JavaScript code, the JavaScript part is the one that runs in a single thread.

In other words, everything runs in parallel except for our JavaScript code. Synchronous blocks of JavaScript code are always run one at a time:

let flag = false
function doSomething() {
  flag = true
  // More code (that doesn't change `flag`)...

  // We can be sure that `flag` here is true.
  // There's no way another code block could have changed
  // `flag` since this block is synchronous.
}

This is great if all we do is asynchronous I/O. Our code consists of small portions of synchronous blocks that run fast and pass data to files and streams. So our JavaScript code is so fast that it doesn’t block the execution of other pieces of JavaScript. A lot more time is spent waiting for I/O events to happen than JavaScript code being executed. Let’s see this with a quick example:

db.findOne('SELECT ... LIMIT 1', function(err, result) {
  if (err) return console.error(err)
  console.log(result)
})
console.log('Running query')
setTimeout(function() {
  console.log('Hey there')
}, 1000)

Maybe this query to the database takes a minute, but the “Running query” message will be shown immediately after invoking the query. And we will see the “Hey there” message a second after invoking the query if the query is still running or not.

Our Node.js application just invokes the function and does not block the execution of other pieces of code. It will get notified through the callback when the query is done, and we will receive the result.

CPU-intensive tasks

What happens if we need to do synchronous-intense stuff, such as doing complex calculations in memory in a large dataset? Then we might have a synchronous block of code that takes a lot of time and will block the rest of the code.

Imagine that a calculation takes 10s. If we are running a web server, that means that all of the other requests get blocked for at least 10s because of that calculation. That’s a disaster. Anything more than 100ms could be too much.

JavaScript and Node.js were not meant to be used for CPU-bound tasks. Since JavaScript is single-threaded, this will freeze the UI in the browser and queue any I/O event in Node.js.

Going back to our previous example, imagine we now have a query that returns a few thousand results, and we need to decrypt the values in our JavaScript code:

db.findAll('SELECT ...', function(err, results) {
  if (err) return console.error(err)

  // Heavy computation and many results
  for (const encrypted of results) {
    const plainText = decrypt(encrypted)
    console.log(plainText)
  }
})

We will get the results in the callback once they are available. Then, no other JavaScript code is executed until our callback finishes its execution.

Usually, as we said before, the code is minimal and fast enough, but in this case, we have many results and we need to do heavy computations on them. This might take a few seconds, and during that time, any other JavaScript execution is queued, which means we might be blocking all our users during that time if we are running a server in the same application.

Why we will never have threads in JavaScript

So, at this point, many people might think somebody needs to add a new module in the Node.js core and allow us to create and sync threads. That should be it, right? It’s a shame we don’t have a nice way of solving this use case in a mature server-side platform such as Node.js.

Well, if we add threads, then we are changing the nature of the language. We cannot just add threads as a new set of classes or functions available. We need to change the language. Languages that support multithreading have keywords such as “synchronized” in order to enable threads to cooperate.

For example, in Java, even some numeric types are not atomic; if you don’t synchronize their access, you could end up having two threads change the value of a variable. The result would be that after both threads have accessed the variable, it has a few bytes changed by one thread and a few bytes changed by the other thread — and, thus, not resulting in any valid value.

#node #javascript #web-development #developer

Multithreading in Node.js
2.20 GEEK