Understanding Streams in Node.js

In this post, we take a look at how to use the Node.js runtime environment to help us work with streams of data coming to our application.

Node.js is known for its asynchronous nature and has many modules that we use every day in our code, but ever get a chance to dive into any deeper. One of these core modules is streams.

Streams allow us to handle data flow asynchronously. There are two data handling approaches in Nod.js.

Buffered approach: The buffered approach says that a receiver can read the data only if the whole data set is written to the buffer.
**Streams approach: **In the streams approach, data arrives in chunks and can be read in chunks; this can be a single part of the data.

Types of streams available

Let’s experiment by creating a big file:

const fs = require("fs");
const file = fs.createWriteStream("./big.file");

for (let i = 0; i <= 1e6; i++) {
  file.write(
    "Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.\n"
  );
}

file.end();

We have created a file using the Writable Stream. The fs module in Node.js can be used to read from and write to files using a Stream interface. Running the above code generates a file that’s about ~400 MB.

Let’s read the same big file using the read stream:

const fs = require("fs");
const server = require("http").createServer();

server.on("request", (req, res) => {
  fs.readFile("./big.file", (err, data) => {
    if (err) throw err;

    res.end(data);
  });
});

server.listen(8000);

Optimized Solution for Data Transformation

Time Efficiency

For better efficiency, we can use a great behavior that comes with streams in Node: piping. Basically, you can pipe two streams where the output of one stream is an input to the other.

What happens is the “data” (chunk) arrives at “stream 1” which is “piped to stream 2” which can further be piped to other streams,

With Pipes:

const fs = require("fs");
const server = require("http").createServer();

server.on("request", (req, res) => {
  const src = fs.createReadStream("./big.file");
  src.pipe(res);
});

server.listen(8000);

This is how we can parallelize multiple stages a data chunk might go through. This strategy is called pipelining. Node.js allows us to pipeline our tasks with the help of streams.

Node.js can work on a single thread but this doesn’t mean we can’t do two tasks or processes at a time. This can be done via child processes in Node.js

#node-js