About Multi-Threading and Multiple Process in Node.js

About Multi-Threading and Multiple Process in Node.js

NodeJS is a free-to-use, cross-platform JavaScript runtime environment that although is single-threaded in nature but uses multiple threads in the background for executing asynchronous code.

In this article, let's talk about multithreading and multiprocess in Node.js, and get a deep understanding of multithreading and multiprocess in Node.js. There is a certain reference value, friends in need can refer to it, I hope to help everyone.

NodeJS is a free-to-use, cross-platform JavaScript runtime environment that although is single-threaded in nature but uses multiple threads in the background for executing asynchronous code.

Due to the non-blocking nature of Node.js, different threads execute different callbacks that are first delegated to the event loop. NodeJS runtime is responsible for handling all of this.

Why NodeJS?

JS was originally built as a single-threaded programing language meant to run only inside a web browser. What this means is that in a process, only a single set of instructions was able to execute at a given moment in time.

The execution moved on to the next code block only when the current code block finished execution. The single-threaded nature of JS, however, made implementation easy.

In its humble beginnings, JavaScript was useful for adding only a little interaction to websites. Therefore, there was nothing that demanded multithreading. However, times have changed, user requirements have intensified, and JavaScript has become “the popular programming language of the web.”

Multithreading has now become common. Because JS is a single threaded language, achieving multithreading in it isn’t possible. Thankfully, there is a great workaround available for this situation; NodeJS.

There is no scarcity to NodeJS frameworks thanks to the popularity enjoyed by the JS runtime environment, in particular and JavaScript, in general. Before continuing with the article, let’s know take in some important points about Node.js:

  1. It is possible to pass messages to the forked process and to the master process from the forked process using the send function
  2. Support for forking multiple processes is available
  3. State isn’t shared between the master and forked processes

Why Fork a Process?

There are two cases when we need to fork a process:

  1. For enhancing the speed by delegating tasks to some other process
  2. For freeing up memory and unloading single process

It is possible to send data to the forked process as well as send it back.

The Way of NodeJS

Node.js makes use of two types of threads:

  1. The main thread handled by event loop and,
  2. Many auxiliary threads in the worker pool

The event loop is responsible for taking callbacks, or functions, and registering the same for execution in the future. It operates in the same thread as that of the proper JS code. Once a JS operation blocks the thread, the event loop gets blocked as well.

The worker pool is an execution model responsible for spawning and handling different threads. It synchronously performs the task and then returns the result to the event loop, which then executes the provided callback with the stated result.

To summarize, the worker pool takes care of asynchronous I/O operations i.e. interactions made with the system disk and network. Modules like the fs and crypto are the ones that primarily use the worker pool.

As the worker pool is implemented in the libuv library, there is a little delay when Node.js requires to communicate internally between JS and C++. However, this is almost imperceptible.

Everything is good until we come across the requirement of synchronously executing a complex operation. Any function that require much time to run will result in blocking the main thread.

If the application has several CPU-intensive functions, then it results in a significant drop in the throughput of the server. In the worst-case scenario, the server will freeze and there will be no way of delegating work to the worker pool.

Areas like AI, big data, and machine learning couldn’t benefit from NodeJS due to the operations blocking the one and only main thread and rendering the server unresponsive. However, this changed with the advent of the NodeJS v10.5.0 that added support for multithreading.

The Challenge of Concurrency & CPU-Bound Tasks

Establishing concurrency in JavaScript can be difficult. Allowing several threads access the same memory results into race conditions that are not only hard to reproduce but also challenging to solve.

NodeJS was originally implemented as a server-side platform based on asynchronous I/O. This made a lot of things easier by simply eliminating the need of threads. Yes, Node.js applications are single-threaded but not in the typical fashion.

We can run things in parallel in Node.js. However, we need not to create threads. The operating system and the virtual machine collectively run the I/O in parallel and the JS code then runs in a single thread when it is time to send the data back to the JavaScript code.

Except for the JS code, everything runs in parallel in Node.js. Unlike asynchronous blocks, synchronous blocks of JS are always executed one at a time. Compared to code execution, a lot more time is spent in waiting for I/O events to occur in JS.

A NodeJS application simply invokes the required functions, or callbacks, and doesn’t block the execution of other code. Originally, neither JavaScript nor NodeJS was meant to handle CPU-intensive or CPU-bound tasks.

When the code is minimal, the execution will be agile. However, the heavier the computations become, the slower the execution gets.

If you still tried to accomplish CPU-intensive tasks in JS and Node then it resulted in freezing the UI in the browser and queuing any I/O event, respectively. Nonetheless, we have come a long way from that. Now, we have the worker_threads module to save the day.

Multithreading Made Simple With The worker_threads Module

Released in the June of 2018, Node.js v10.5.0 introduced the worker_threads module. It facilitates implementing concurrency in the popular JavaScript runtime environment. The module allows creating fully functional multithreaded NodeJS apps.

Technically, a worker thread is some code that is spawned in a separate thread. To begin using worker threads, one requires to import the worker_threads module first. Afterwards, an instance of the Worker class needs to be created for creating a worker thread.

When creating an instance of the Worker class, there are two arguments:

  1. The first one provides a path to the file, having .js or .mjs extensions, containing the worker thread’s code and,
  2. The second one provides an object containing a workerData property that contains the data meant to be accessed by the worker thread when it begins execution

Worker threads are capable of dispatching more than one message events. As such, the callback approach is given preference over the option of returning a promise.

The communication among the worker threads is event-based i.e. listeners are set up to be called as soon as the event is sent by the worker thread. 4 of the most common events are:

worker.on('error', (error) => {});
  1. Emitted when there is an uncaught exception inside the worker thread. Next, the worker thread terminates, and the error is made available as the first argument in the provided callback.
worker.on('exit', (exitCode) => {})

2. Emitted when a worker thread exits. exitCode would be provided to the callback if process.exit() was called inside the worker thread. The code is 1 if worker.terminate() terminates the worker thread.

worker.on('message', (data) => {});

3. Emitted when a worker thread sends data to the parent thread.

worker.on('online', () => {});

4. Emitted when a worker thread stops parsing the JS code and begins execution. Although not commonly used, the online event can be informative in specific scenarios.

Ways of Using Worker threads

There are two ways for using worker threads:

  • Approach 1 – Involves spawning a worker thread, executing its code, and sending the result to the parent thread. This approach necessitates for creating a new worker from scratch every time for a new task.
  • Approach 2 – Involves spawning a worker thread and setting up listeners for the message event. Every time the message is fired, the worker thread executes the code and sends the result back to the parent thread. The worker thread remains alive for future use.

Approach 2 is also known as worker pool. This is because the approach involves creating a pool of workers, putting them on waiting, and dispatching the message event to do the task when required.

Because creating a worker thread from scratch demands creating a virtual machine and parsing and executing code, the official NodeJS documentation recommends employing the approach 2. Moreover, approach 2 is more efficient than approach 1.

Important Properties Available in the worker_threads Module

  • isMainThread – This property is true when not operating inside a worker thread. If needs be, then a simple if statement can be included at the start of the worker file. This ensures that it runs only as a worker thread.
  • parentPort – An instance of MessagePort, it is used for communicating with the parent thread.
  • threadId – A unique identifier assigned to the worker thread.
  • workerData – Contains the data included in the worker thread’s constructor by the spawning thread.

Multiprocessing in NodeJS

In order to harness the power of a multi-core system using Node.js, processes are available. The popular JS runtime environment features a module called cluster that provides support for multiprocessing.

The cluster module enables spawning multiple child processes, which can share a common port. A system using NodeJS can handle greater workloads when the child processes are put into action.

Node.js for the Backend

Internet has already become the platform of choice for millions, if not billions, around the world. Therefore, to push a business to its maximum potential and leave no stone unturned in the cause of the same it is mandatory to have a robust online presence.

It all starts with a powerful, intuitive website. To make an impeccable website it is important to choose the best frontend and backend technologies. Although single-threaded in nature, Node.js is a top choice in the cause of backend web development services.

Despite having an abundance of multi-threaded backend options, reputed companies like to opt for NodeJS. This is because Node.js offers the workaround for using multithreading in JavaScript, which is already “the most popular programming language of the web.”

Conclusion

The worker_threads module offers an easy way of implementing multithreading in Node.js applications. A server’s throughput can be significantly enhanced by delegating CPU-heavy computations to the worker threads.

With the newfound support for multithreading, NodeJS will continue to reach more and more developers, engineers, and other professionals from calculation-heavy fields such as AI, big data, and machine learning.

Angular 9 Tutorial: Learn to Build a CRUD Angular App Quickly

What's new in Bootstrap 5 and when Bootstrap 5 release date?

What’s new in HTML6

How to Build Progressive Web Apps (PWA) using Angular 9

What is new features in Javascript ES2020 ECMAScript 2020

Getting Started With Threads in NodeJS

Getting Started With Threads in NodeJS

How Node.js really works. Node.js uses two kinds of threads: a main thread handled by event loop and several auxiliary threads in the worker pool. Event loop is the mechanism that takes callbacks (functions) and registers them to be executed at some point in the future. Worker pool is an execution model that spawns and handles separate threads. The worker_threads module is a package that allows us to create fully functional multithreaded Node.js applications.

Many people wonder how a single-threaded Node.js can compete with multithreaded back ends. As such, it may seem counterintuitive that so many huge companies pick Node as their back end, given its supposed single-threaded nature. To know why, we have to understand what we really mean when we say that Node is single-threaded.

JavaScript was created to be just good enough to do simple things on the web, like validate a form or, say, create a rainbow-colored mouse trail. It was only in 2009 that Ryan Dahl, creator of Node.js, made it possible for developers to use the language to write back-end code.

Back-end languages, which generally support multithreading, have all kinds of mechanisms for syncing values between threads and other thread-oriented features. To add support for such things to JavaScript would require changing the entire language, which wasn’t really Dahl’s goal. For plain JavaScript to support multithreading, he had to create a workaround. Let’s explore …

How Node.js really works

Node.js uses two kinds of threads: a main thread handled by event loop and several auxiliary threads in the worker pool.

Event loop is the mechanism that takes callbacks (functions) and registers them to be executed at some point in the future. It operates in the same thread as the proper JavaScript code. When a JavaScript operation blocks the thread, the event loop is blocked as well.

Worker pool is an execution model that spawns and handles separate threads, which then synchronously perform the task and return the result to the event loop. The event loop then executes the provided callback with said result.

In short, it takes care of asynchronous I/O operations — primarily, interactions with the system’s disk and network. It is mainly used by modules such as fs (I/O-heavy) or crypto (CPU-heavy). Worker pool is implemented in libuv, which results in a slight delay whenever Node needs to communicate internally between JavaScript and C++, but this is hardly noticeable.

With both of these mechanisms, we are able to write code like this:

fs.readFile(path.join(__dirname, './package.json'), (err, content) => {
 if (err) {
   return null;
 }
 console.log(content.toString());
});

The aforementioned fs module tells the worker pool to use one of its threads to read the contents of a file and notify the event loop when it is done. The event loop then takes the provided callback function and executes it with the content of the file.

Above is an example of a non-blocking code; as such, we don’t have to wait synchronously for something to happen. We tell the worker pool to read the file and call the provided function with the result. Since worker pool has its own threads, the event loop can continue executing normally while the file is being read.

It’s all good until there’s a need to synchronously execute some complex operation: any function that takes too long to run will block the thread. If an application has many such functions, it could significantly decrease the throughput of the server or freeze it altogether. In this case, there’s no way of delegating the work to the worker pool.

Fields that require complex calculations — such as AI, machine learning, or big data — couldn’t really use Node.js efficiently due to the operations blocking the main (and only) thread, making the server unresponsive. That was the case up until Node.js v10.5.0 came about, which added support for multiple threads.

Introducing: worker_threads

The worker_threads module is a package that allows us to create fully functional multithreaded Node.js applications.

A thread worker is a piece of code (usually taken out of a file) spawned in a separate thread.

Note that the terms thread worker, worker, and thread are often used interchangeably; they all refer to the same thing.

To start using thread workers, we have to import the worker_threads module. Let’s start by creating a function to help us spawn these thread workers, and then we’ll talk a little bit about their properties.

type WorkerCallback = (err: any, result?: any) => any;
export function runWorker(path: string, cb: WorkerCallback, workerData: object | null = null) {
 const worker = new Worker(path, { workerData });
 worker.on('message', cb.bind(null, null));
 worker.on('error', cb);
 worker.on('exit', (exitCode) => {
   if (exitCode === 0) {
     return null;
   }
   return cb(new Error(`Worker has stopped with code ${exitCode}`));
 });
 return worker;
}

To create a worker, we have to create an instance of the Worker class. In the first argument, we provide a path to the file that contains the worker’s code; in the second, we provide an object containing a property called workerData. This is the data we’d like the thread to have access to when it starts running.

Note that whether you use JavaScript itself or something that transpiles to JavaScript (e.g., TypeScript), the path should always refer to files with either .js or .mjs extensions.

I would also like to point out why we used the callback approach as opposed to returning a promise that would be resolved when the message event is fired. This is because workers can dispatch many message events, not just one.

As you can see in the example above, the communication between threads is event-based, which means we are setting up listeners to be called once a given event is sent by the worker.

Here are the most common events:

worker.on('error', (error) => {});

The error event is emitted whenever there’s an uncaught exception inside the worker. The worker is then terminated, and the error is available as the first argument in the provided callback.

worker.on('exit', (exitCode) => {});

exit is emitted whenever a worker exits. If process.exit() was called inside the worker, exitCode would be provided to the callback. If the worker was terminated with worker.terminate(), the code would be 1.

worker.on('online', () => {});

online is emitted whenever a worker stops parsing the JavaScript code and starts the execution. It’s not used very often, but it can be informative in specific cases.

worker.on('message', (data) => {});

message is emitted whenever a worker sends data to the parent thread.

Now let’s take a look at how the data is being shared between threads.

Exchanging data between threads

To send the data to the other thread, we use the port.postMessage() method. It has the following signature:

port.postMessage(data[, transferList])

The port object can be either parentPort or an instance of MessagePort — more on that later.

The data argument

The first argument — here called data — is an object that is copied to the other thread. It can contain anything the copying algorithm supports.

The data is copied by the structured clone algorithm. Per Mozilla:

It builds up a clone by recursing through the input object while maintaining a map of previously visited references in order to avoid infinitely traversing cycles.

The algorithm doesn’t copy functions, errors, property descriptors, or prototype chains. It should also be noted that copying objects in this way is different than with JSON because it can contain circular references and typed arrays, for example, whereas JSON cannot.

By supporting the copying of typed arrays, the algorithm makes it possible to share memory between threads.

Sharing memory between threads

People may argue that modules like cluster or child_process enabled the use of threads a long time ago. Well, yes and no.

The cluster module can create multiple node instances with one master process routing incoming requests between them. Clustering an application allows us to effectively multiply the server’s throughput; however, we can’t spawn a separate thread with the cluster module.

People tend to use tools like PM2 to cluster their applications as opposed to doing it manually inside their own code, but if you’re interested, you can read my post on how to use the cluster module.

The child_process module can spawn any executable regardless of whether it’s JavaScript. It is pretty similar, but it lacks several important features that worker_threads has.

Specifically, thread workers are more lightweight and share the same process ID as their parent threads. They can also share memory with their parent threads, which allows them to avoid serializing big payloads of data and, as a result, send the data back and forth much more efficiently.

Now let’s take a look at an example of how to share memory between threads. In order for the memory to be shared, an instance of ArrayBuffer or SharedArrayBuffer must be sent to the other thread as the data argument or inside the data argument.

Here’s a worker that shares memory with its parent thread:

import { parentPort } from 'worker_threads';
parentPort.on('message', () => {
 const numberOfElements = 100;
 const sharedBuffer = new SharedArrayBuffer(Int32Array.BYTES_PER_ELEMENT * numberOfElements);
 const arr = new Int32Array(sharedBuffer);
 for (let i = 0; i < numberOfElements; i += 1) {
   arr[i] = Math.round(Math.random() * 30);
 }
 parentPort.postMessage({ arr });
});

First, we create a SharedArrayBuffer with the memory needed to contain 100 32-bit integers. Next, we create an instance of Int32Array, which will use the buffer to save its structure, then we just fill the array with some random numbers and send it to the parent thread.

In the parent thread:

import path from 'path';
import { runWorker } from '../run-worker';
const worker = runWorker(path.join(__dirname, 'worker.js'), (err, { arr }) => {
 if (err) {
   return null;
 }
 arr[0] = 5;
});
worker.postMessage({});

By changing arr[0] to 5, we actually change it in both threads.

Of course, by sharing memory, we risk changing a value in one thread and having it changed in the other. But we also gain a very nice feature along the way: the value doesn’t need to be serialized to be available in another thread, which greatly increases efficiency. Simply remember to manage references to the data properly in order for it to be garbage-collected once you finish working with it.

Sharing an array of integers is fine, but what we’re really interested in is sharing objects — the default way of storing information. Unfortunately, there is no SharedObjectBuffer or similar, but we can create a similar structure ourselves.

The transferList argument

transferList can only contain ArrayBuffer and MessagePort. Once they are transferred to the other thread, they can no longer be used in the sending thread; the memory is moved to the other thread and, thus, is unavailable in the sending one.

For the time being, we can’t transfer network sockets by including them in the transferList (which we can do with the child_process module).

Creating a channel for communications

Communication between threads is made through ports, which are instances of the MessagePort class and enable event-based communication.

There are two ways of using ports to communicate between threads. The first is the default and the easier of the two. Within the worker’s code, we import an object called parentPort from the worker_threads module and use the object’s .postMessage() method to send messages to the parent thread.

Here’s an example:

import { parentPort } from 'worker_threads';
const data = {
// ...
};
parentPort.postMessage(data);

parentPort is an instance of MessagePort that Node.js created for us behind the scenes to enable communication with the parent thread. This way, we can communicate between threads by using parentPort and worker objects.

The second way of communicating between threads is to actually create a MessageChannel on our own and send it to the worker. Here’s how we could create a new MessagePort and share it with our worker:

import path from 'path';
import { Worker, MessageChannel } from 'worker_threads';
const worker = new Worker(path.join(__dirname, 'worker.js'));
const { port1, port2 } = new MessageChannel();
port1.on('message', (message) => {
 console.log('message from worker:', message);
});
worker.postMessage({ port: port2 }, [port2]);

After creating port1 and port2, we set up event listeners on port1 and send port2 to the worker. We have to include it in the transferList for it to be transferred to the worker side.

And now, inside the worker:

import { parentPort, MessagePort } from 'worker_threads';
parentPort.on('message', (data) => {
 const { port }: { port: MessagePort } = data;
 port.postMessage('heres your message!');
});

This way, we use the port that was sent by the parent thread.

Using parentPort is not necessarily a wrong approach, but it’s better to create a new MessagePort with an instance of MessageChannel and then share it with the spawned worker (read: separation of concerns).

Note that in the examples below, I use parentPort to keep things simple.

Two ways of using workers

There are two ways we can use workers. The first is to spawn a worker, execute its code, and send the result to the parent thread. With this approach, each time a new task comes up, we have to create a worker all over again.

The second way is to spawn a worker and set up listeners for the message event. Each time the message is fired, it does the work and sends the result back to the parent thread, which keeps the worker alive for later usage.

Node.js documentation recommends the second approach because of how much effort it takes to actually create a thread worker, which requires creating a virtual machine and parsing and executing the code. This method is also much more efficient than constantly spawning workers.

This approach is called worker pool because we create a pool of workers and keep them waiting, dispatching the message event to do the work when needed.

Here’s an example of a file that contains a worker that is spawned, executed, and then closed:

import { parentPort } from 'worker_threads';
const collection = [];
for (let i = 0; i < 10; i += 1) {
 collection[i] = i;
}
parentPort.postMessage(collection);

After sending the collection to the parent thread, it simply exits.

And here’s an example of a worker that can wait for a long period of time before it is given a task:

import { parentPort } from 'worker_threads';
parentPort.on('message', (data: any) => {
 const result = doSomething(data);
 parentPort.postMessage(result);
});

Useful properties available in the worker_threads module

There are a few properties available inside the worker_threads module:

isMainThread

The property is true when not operating inside a worker thread. If you feel the need, you can include a simple if statement at the start of a worker file to make sure it is only run as a worker.

import { isMainThread } from 'worker_threads';
if (isMainThread) {
 throw new Error('Its not a worker');
}

workerData

Data included in the worker’s constructor by the spawning thread.

const worker = new Worker(path, { workerData });

In the worker thread:

import { workerData } from 'worker_threads';
console.log(workerData.property);

parentPort

The aforementioned instance of MessagePort used to communicate with the parent thread.

threadId

A unique identifier assigned to the worker.

Now that we know the technical details, let’s implement something and test out our knowledge in practice.

Implementing setTimeout

setTimeout is an infinite loop that, as the name implies, times out the app. In practice, it checks in each iteration whether the sum of the starting date and a given number of milliseconds are smaller than the actual date.

import { parentPort, workerData } from 'worker_threads';
const time = Date.now();
while (true) {
 if (time + workerData.time <= Date.now()) {
   parentPort.postMessage({});
   break;
 }
}

This particular implementation spawns a thread, executes its code, and then exits after it’s done.

Let’s try implementing the code that will make use of this worker. First, let’s create a state in which we’ll keep track of the spawned workers:

const timeoutState: { [key: string]: Worker } = {};

And now the function that takes care of creating workers and saving them into the state:

export function setTimeout(callback: (err: any) => any, time: number) {
 const id = uuidv4();
 const worker = runWorker(
   path.join(__dirname, './timeout-worker.js'),
   (err) => {
     if (!timeoutState[id]) {
       return null;
     }
     timeoutState[id] = null;
     if (err) {
       return callback(err);
     }
     callback(null);
   },
   {
     time,
   },
 );
 timeoutState[id] = worker;
 return id;
}

First we use the UUID package to create a unique identifier for our worker, then we use the previously defined helper function runWorker to get the worker. We also pass to the worker a callback function to be fired once the worker sends some data. Finally, we save the worker in the state and return the id.

Inside the callback function, we have to check whether the worker still exists in the state because there is a possibility to cancelTimeout(), which would remove it. If it does exist, we remove it from the state and invoke the callback passed to the setTimeout function.

The cancelTimeout function uses the .terminate() method to force the worker to quit and removes that worker from the state:

export function cancelTimeout(id: string) {
 if (timeoutState[id]) {
   timeoutState[id].terminate();
   timeoutState[id] = undefined;
   return true;
 }
 return false;
}

If you’re interested, I also implemented setInterval here, but since it has nothing to do with threads (we reuse the code of setTimeout), I have decided not to include the explanation here.

I have created a little test code for the purpose of checking how much this approach differs from the native one. You can review the code here. These are the results:

native setTimeout { ms: 7004, averageCPUCost: 0.1416 }
worker setTimeout { ms: 7046, averageCPUCost: 0.308 }

We can see that there’s a slight delay in our setTimeout — about 40ms — due to the worker being created. The average CPU cost is also a little bit higher, but nothing unbearable (the CPU cost is an average of the CPU usage across the whole duration of the process).

If we could reuse the workers, we would lower the delay and CPU usage, which is why we’ll now take a look at how to implement our own worker pool.

Implementing a worker pool

As mentioned above, a worker pool is a given number of previously created workers sitting and listening for the message event. Once the message event is fired, they do the work and send back the result.

To better illustrate what we’re going to do, here’s how we would create a worker pool of eight thread workers:

const pool = new WorkerPool(path.join(__dirname, './test-worker.js'), 8);

If you are familiar with limiting concurrent operations, then you will see that the logic here is almost the same, just a different use case.

As shown in the code snippet above, we pass to the constructor of WorkerPool the path to the worker and the number of workers to spawn.

export class WorkerPool<T, N> {
 private queue: QueueItem<T, N>[] = [];
 private workersById: { [key: number]: Worker } = {};
 private activeWorkersById: { [key: number]: boolean } = {};
 public constructor(public workerPath: string, public numberOfThreads: number) {
   this.init();
 }
}

Here, we have additional properties like workersById and activeWorkersById, in which we can save existing workers and the IDs of currently running workers, respectively. There’s also queue, in which we can save objects with the following structure:

type QueueCallback<N> = (err: any, result?: N) => void;
interface QueueItem<T, N> {
 callback: QueueCallback<N>;
 getData: () => T;
}

callback is just the default node callback, with error as its first argument and the possible result as the second. getData is the function passed to the worker pool’s .run() method (explained below), which is called once the item starts being processed. The data returned by the getData function will be passed to the worker thread.

Inside the .init() method, we create the workers and save them in the states:

private init() {
  if (this.numberOfThreads < 1) {
    return null;
  }
  for (let i = 0; i < this.numberOfThreads; i += 1) {
    const worker = new Worker(this.workerPath);
    this.workersById[i] = worker;
    this.activeWorkersById[i] = false;
  }
}

To avoid infinite loops, we first ensure the number of threads is >1. We then create the valid number of workers and save them by their index in the workersById state. We save information on whether they are currently running inside the activeWorkersById state, which, at first, is always false by default.

Now we have to implement the aforementioned .run() method to set up a task to run once a worker is available.

public run(getData: () => T) {
  return new Promise<N>((resolve, reject) => {
    const availableWorkerId = this.getInactiveWorkerId();
    const queueItem: QueueItem<T, N> = {
      getData,
      callback: (error, result) => {
        if (error) {
          return reject(error);
        }
return resolve(result);
      },
    };
   if (availableWorkerId === -1) {
      this.queue.push(queueItem);
      return null;
    }
    this.runWorker(availableWorkerId, queueItem);
  });
}

Inside the function passed to the promise, we first check whether there’s a worker available to process the data by calling the .getInactiveWorkerId():

private getInactiveWorkerId(): number {
  for (let i = 0; i < this.numberOfThreads; i += 1) {
    if (!this.activeWorkersById[i]) {
      return i;
    }
  }
  return -1;
}

Next, we create a queueItem, in which we save the getData function passed to the .run() method as well as the callback. In the callback, we either resolve or reject the promise depending on whether the worker passed an error to the callback.

If the availableWorkerId is -1, then there is no available worker, and we add the queueItem to the queue. If there is an available worker, we call the .runWorker() method to execute the worker.

In the .runWorker() method, we have to set inside the activeWorkersById state that the worker is currently being used; set up event listeners for message and error events (and clean them up afterwards); and, finally, send the data to the worker.

private async runWorker(workerId: number, queueItem: QueueItem<T, N>) {
 const worker = this.workersById[workerId];
 this.activeWorkersById[workerId] = true;
 const messageCallback = (result: N) => {
   queueItem.callback(null, result);
   cleanUp();
 };
 const errorCallback = (error: any) => {
   queueItem.callback(error);
   cleanUp();
 };
 const cleanUp = () => {
   worker.removeAllListeners('message');
   worker.removeAllListeners('error');
   this.activeWorkersById[workerId] = false;
   if (!this.queue.length) {
     return null;
   }
   this.runWorker(workerId, this.queue.shift());
 };
 worker.once('message', messageCallback);
 worker.once('error', errorCallback);
 worker.postMessage(await queueItem.getData());
}

First, by using the passed workerId, we get the worker reference from the workersById state. Then, inside activeWorkersById, we set the [workerId] property to true so we know not to run anything else while the worker is busy.

Next, we create messageCallback and errorCallback to be called on message and error events, respectively, then register said functions to listen for the event and send the data to the worker.

Inside the callbacks, we call the queueItem’s callback, then call the cleanUp function. Inside the cleanUp function, we make sure event listeners are removed since we reuse the same worker many times. If we didn’t remove the listeners, we would have a memory leak; essentially, we would slowly run out of memory.

Inside the activeWorkersById state, we set the [workerId] property to false and check if the queue is empty. If it isn’t, we remove the first item from the queue and call the worker again with a different queueItem.

Let’s create a worker that does some calculations after receiving the data in the message event:

import { isMainThread, parentPort } from 'worker_threads';
if (isMainThread) {
 throw new Error('Its not a worker');
}
const doCalcs = (data: any) => {
 const collection = [];
 for (let i = 0; i < 1000000; i += 1) {
   collection[i] = Math.round(Math.random() * 100000);
 }
 return collection.sort((a, b) => {
   if (a > b) {
     return 1;
   }
   return -1;
 });
};
parentPort.on('message', (data: any) => {
 const result = doCalcs(data);
 parentPort.postMessage(result);
});

The worker creates an array of 1 million random numbers and then sorts them. It doesn’t really matter what happens as long as it takes some time to finish.

Here’s an example of a simple usage of the worker pool:

const pool = new WorkerPool<{ i: number }, number>(path.join(__dirname, './test-worker.js'), 8);
const items = [...new Array(100)].fill(null);
Promise.all(
 items.map(async (_, i) => {
   await pool.run(() => ({ i }));
   console.log('finished', i);
 }),
).then(() => {
 console.log('finished all');
});

We start by creating a pool of eight workers. We then create an array with 100 elements, and for each element, we run a task in the worker pool. First, eight tasks will be executed immediately, and the rest will be put in the queue and gradually executed. By using a worker pool, we don’t have to create a worker each time, which vastly improves efficiency.

Conclusion

worker_threads provide a fairly easy way to add multithreading support to our applications. By delegating heavy CPU computations to other threads, we can significantly increase our server’s throughput. With the official threads support, we can expect more developers and engineers from fields like AI, machine learning, and big data to start using Node.js.

How to get started Internationalization in JavaScript with NodeJS

How to get started Internationalization in JavaScript with NodeJS

Tutorial showing how to use the Intl JS API in NodeJS (i18n). We'll install a module to unlock the Intl API languages for Node and test out RelativeTimeFormat to translate and localise relative times in JavaScript.

Tutorial showing how to use the Intl JS API in NodeJS (i18n). We'll install a module to unlock the Intl API languages for Node and test out RelativeTimeFormat to translate and localise relative times in JavaScript. I'll tell you how to get started with the built-in internationalization library in JS for Node 12 and higher. We'll change the locale to see how the translation works and test different BCP 47 language tags.

Internationalization is a difficult undertaking but using the Intl API is an easy way to get started, it's great to see this new API in the JS language and available for use. Soon, you'll be able to have confidence using it in the browser as modern browsers support the major Intl features. Have a look at the browser compatibility charts to see which browsers and versions of node are supported.

Use Intl.RelativeTimeFormat for language-sensitive relative time formatting.
#javascript #nodejs #webdevelopment

MDN Documentation:

https://developer.mozilla.org/en-US/d...

Full ICU NPM package:

https://www.npmjs.com/package/full-icu

Learning To Create Python Multi-Threaded And Multi-Process

Learning To Create Python Multi-Threaded And Multi-Process

Learning To Create Python Multi-Threaded And Multi-Process - This article is not useful for experienced Python developers. It is rather a superficial overview of Python multi-threaded features for those who recently started learning Python.

Learning To Create Python Multi-Threaded And Multi-Process - This article is not useful for experienced Python developers. It is rather a superficial overview of Python multi-threaded features for those who recently started learning Python.

Unfortunately, you cannot find tons of material about multithreading in Python. Moreover, quite often I meet Python beginners who don’t know about GIL, for example. In this article, I want to cover the most basic features of a Python multi-threaded, discuss what does GIL stand for and how to act with/without it.

Python is a perfect programming language. It ideally includes many programming paradigms. Majority of the tasks that can be faced by a developer are solved easily, elegantly and concisely using Python. However, a single-threaded solution is quite often enough for all those tasks. Single-threaded programs are usually predictable and easy-to-debug. The same can’t be said for the Python multi-threaded and multi-process applications.

Python Multi-Threaded Application

Python has a threading module that includes everything for the multi-threaded programming: there you can find different types of locks, a semaphore, and an event mechanism. Thus, you get everything you need for the vast majority of Python multi-threaded applications. Moreover, it is extremely easy to use all those tools. To make sure, let’s discuss an example of an application that runs two threads. The first thread types ten “0”, the second – ten “1”, and strictly in turn.

import threading

def writer(x, event_for_wait, event_for_set):

    for i in xrange(10):

        event_for_wait.wait() # wait for event

        event_for_wait.clear() # clean event for future

        print x

        event_for_set.set() # set event for neighbor thread

# init events

e1 = threading.Event()

e2 = threading.Event()

# init threads

t1 = threading.Thread(target=writer, args=(0, e1, e2))

t2 = threading.Thread(target=writer, args=(1, e2, e1))

# start threads

t1.start()

t2.start()

e1.set() # initiate the first event

# join threads to the main thread

t1.join()

t2.join()

No magic and voodoo code. As you can see, the code is accurate and consistent. As you can see, we have created the thread out of a function which is highly convenient for small tasks. Moreover, this code is rather flexible. For example, you have created one more thread that types “2”. Thus, you will get the following:

import threading

def writer(x, event_for_wait, event_for_set):

    for i in xrange(10):

        event_for_wait.wait() # wait for event

        event_for_wait.clear() # clean event for future

        print x

        event_for_set.set() # set event for neighbor thread

# init events

e1 = threading.Event()

e2 = threading.Event()

e3 = threading.Event()

# init threads

t1 = threading.Thread(target=writer, args=(0, e1, e2))

t2 = threading.Thread(target=writer, args=(1, e2, e3))

t3 = threading.Thread(target=writer, args=(2, e3, e1))

# start threads

t1.start()

t2.start()

t3.start()

e1.set() # initiate the first event

# join threads to the main thread

t1.join()

t2.join()

t3.join()

Here, we have added a new event, a new thread, and changed the parameters passed to the threads for the start (it is also possible to create a more general solution with the help of MapReduce, for example, but is out of the article). Thus, as you can see, there are no complicated things and magic. Everything is simple and comprehensive. Let’s move forward.

Global Interpreter Lock

There are two the most widespread reasons for using threads. Firstly, it is useful for increasing the using of modern multi-core processor architecture. It means increasing application performance. Secondly, threads are of utmost importance in case we need to divide the application logic into parallel and fully or partly asynchronous sections (i.e. you need to have an opportunity to ping several servers simultaneously).

Considering the first situation, we face the following Python limitation called Global Interpreter Lock (GIL). The GIL concept means that at any specific time only one thread may be executed by a processor. It was designed to avoid the threads being competing for different variables. An executing thread gains access through the whole environment. This feature of Python thread implementation significantly simplifies the work with threads. Moreover, this feature provides you with certain thread safety.

However, you should pay attention to the following moment: it may seem that a Python multi-threaded application will work exactly the same time as a single-threaded, doing the same thing. However, here you will face the following unpleasant issue. Let’s consider the following code to understand what I mean:

with open('test1.txt', 'w') as fout:

    for i in xrange(1000000):

        print >> fout, 1

# This application just creates a million of strings ‘1’ for ~0.35s on my local machine.

# Now, let’s consider another program for comparison:

from threading import Thread

def writer(filename, n):

    with open(filename, 'w') as fout:

        for i in xrange(n):

            print >> fout, 1

t1 = Thread(target=writer, args=('test2.txt', 500000,))

t2 = Thread(target=writer, args=('test3.txt', 500000,))

t1.start()

t2.start()

t1.join()

t2.join()

The second application creates 2 threads. In each thread, the application creates a separate file for half a million lines “1”. In fact, the amount of work is the same. However, over time, you will see an interesting effect. The application performance is from 0.7 seconds to 7 seconds. What is the reason for this situation?

In fact, it happens due to the fact that when a thread does not need a CPU resource, it frees GIL. At this moment, both threads can try to get it. At the same time, the operating system knows that there are many cores. Thus, it can intensify this situation by trying to distribute the threads between the cores.

UPD: currently, Python 3.2 has an improved implementation of GIL, and this problem is partially solved. The solution is about the fact that each thread after losing control waits for a short period before it can capture GIL again.

“Thus, it’s extremely difficult to create an effective multi-threaded application in Python?” you may ask. However, keep calm, there is always a solution.

Python Multi-Process Applications

In order to solve the problem mentioned in the previous chapter, Python provides us with the subprocess module. We can create an application that should be executed in a parallel thread and execute it in several threads in other application. This solution would have significantly increased our application performance. The matter is that the threads created in GIL only wait for the launched process shutdown. However, this approach has many problems as well. The main issue is that it becomes difficult to transfer the data between the processes. This way, we would have to serialize objects, adjust the connection through PIPE or other tools. In fact, it results in additional expenses and the code becomes hard to understand.

For this reason, here comes another useful approach. Python also has multiprocessing module which is quite similar to threading. For example, the processes can be created in the same way using the functions. The methods of operation with the processes are almost the same as for threading. However, in order to synchronize the processes and provide data sharing – we need to use other tools. I am referring to Queues and Pipes. Alongside, the analogues to locks, events and semaphores mentioned in threading are also available here.

Additionally, the multiprocessing module also provides an operating principle of general memory. Thus, the module provides the class of Value and Array variables that can be shared between the processes. For the convenience of working with the variables, you can use Manager classes. They are more flexible and convenient, but slower.

Furthermore, the multiprocessing module provides an opportunity to create pools of processes. This mechanism is pretty convenient for implementing the Master-Worker template to build a parallel Map.

Among the basic problems of working with multiprocessing, I need to point out a module relative platform dependence. Due to the fact that different operating systems provide a different working process with the processes, the code receives several limitations. For example, Windows OS doesn’t have fork mechanism. Therefore, you need to wrap the processes separation point in the following:

if __name__ =='__main__':

Thus, this code construction is a good code style.

What’s More …

In order to create parallel applications using Python, there are also other libraries and approaches. For example, you can use Haddop+Python or different implementations of MPI and Python (pyMPI, mpi4py). Moreover, it’s even possible to use the wrappers of the existing libraries in C++ and Fortran. Here, we can mention such frameworks/libraries as Pyro, Twisted, Tornado and so on. However, all these things are the topics of other articles.