In this article, you'll learn how I built a Node.js Service to Clone your AWS S3 Buckets
If you are an absolute starter on Node.js, please go through this write-up to get an overall understanding of why and how to Node.js?
Since this write-up is about building a fully-functional Node.js application, we would not go deep into the basics and quickly touch upon the key points to look out for and also a bit of insight into the application itself.
AWS is a cloud-based eco-system where you can manage the whole infrastructure of your applications without needing any physical hardware or server.
In short, AWS offers a wide range of solutions for companies seeking to store data, access data, run servers, scale their existing services and much more. Out of these services, one of the most important and simplest service is S3(Simple Storage Service). The S3 not only offers block-storage space, meaning that you can store any file format on it but also comes with a set of REST APIs through which you can make CRUD operations.
These two technologies mentioned above, make a powerful combination in the industry for various use cases when your Node.js application needs to randomly store and retrieve files, store executables or any other data format that is not supported by an ACID complaint DB such as SQL DBs. In short, it’s a file storage system where you can store any type of data and easily access them.
This project is basically building a Node.js application to clone your S3 bucket locally and recursively.
TBH, this project is a one-liner using the AWS-CLI. Yes, you heard it right. So why are we doing it anyway?
“Why should all the problems must always have only one solution? I simply like more than one. Be it good or bad”
Let’s look at the existing solution first. You install the AWS-CLI and run the following command:
aws s3 cp s3://my-s3-bucket/ ./ --recursive
_S_urprisingly, apart from using the AWS CLI, I didn’t find any proper Node.js script or an app that would do this for medium to large scale buckets using the AWS-SDK. The answers I found on Node.js posts online had several problems, including half-baked scripts, scripts that would try and synchronously create a file and would not know when to complete and also would ignore cloning empty folders if there was any. Basically, it didn’t do the job right. Hence decided to write one myself, properly.
I am positive that this would give you a better understanding of how a Node.js application should look and feel, in spite of its size and operation.
As I said above, I am not gonna explain the code line-by-line, as I am posting the entire base out. Instead, I will talk about how I have architected the application, with a bit of insight into the core-logic and key features. Let me list out what you can expect and get an idea by the end of this write-up.
Before we jump the guns, let’s ask why do we call this a service and not a server. I call this a service because it just does one job and no external entities are sending requests to our running service, if they were to send the request to our application, then I would probably call this a server if it’s listening on a port.
When I was a junior frontend developer, I always thought that Node.js applications were made only of a single server file and thought that the code would always be messy, while that is true to some extent, I later learnt that that’s not how it should be. As a responsible developer, we should always aim to keep the files as small as possible and separate the concerns properly. That’s the key to developing a scalable Node.js application.
There is no opinionated approach for building the project structure, it can change from project to project based on the use case. Personally, I split them into smaller independent modules. One module does one type of task and one type of task only.
Let’s say, if I have a network.js file**,** it does only network operations and doesn’t modify the file structure or create a new file.
Let’s look at our project structure for instance,
As I said before, there is no particular way to structure your project but its ideal to pick a topic and group all your files under that topic. For me, it was activity, “what does that file handle and how ?”.
Let’s start from the root and go step by step.
These are project dependencies and are essential for development and deployment. And are mostly straight forward to understand:
And then comes the config file, the config file consists of all your application config, api_keys, bucket name, target directory, third_party links etc., normally we would have two config files one for production and one for the development environment.
Once we made the skeleton of the application ready with the application dependencies, then we have the core entities. In our application, the Core entities include Handler, Service and Storage.
Handler is where we glue the entire application and create Services, Storages and inject required dependencies and expose an API for the index.js to handle.
Service is where our core-logic of the application lives and all the jobs to other dependencies are delegated from here.
Storage is where all our storage operations take place. In our application S3 is the external storage from where we retrieve our files and data from, hence the AWS-SDK operations exclusively happen only inside this storage module.
When the service starts to run it needs to do all the intended tasks at the same time. For example, in our application, once we get the list of contents under a directory, we need to start creating/cloning the contents locally. This operation is delegated to cloner.js, a helper which is only responsible for cloning the files and folders. The cloner, in turn, needs to access the fileOps.js module to create directories and files.
Under helpers, we have the following files doing their respective chores with or without the help of other helpers.
Cloner, handles the cloning of the files and directories with the help of fileOps module.
Data Handler, maps and parses the data from S3 bucket into a usable or consumable data by the service.
Downloader, only downloads files from S3 bucket and pipes the operation to a write-stream or simply put, takes care of downloading files asynchronously.
fileOps, as the name suggests, uses Node’s fs module to create file-streams and directories.
filePath provides the entire application with the target folder for cloning the S3 bucket’s files and directories and also returns the target bucket and target root directory on the S3.
Logger inside utils returns an instance of Bunyan logger which can be used to send logs to third parties like Kibana.
Now that we have done our project setup, let’s look into the core logic of the service module. It involves the sequence of the following actions:
Additionally, the service not only clones the entire bucket but also clones only specific folders, without losing the folder tree structure, based on our Prefix configuration as specified here as the rootDirectory for cloning.
Another important concept around the Node.js is using streams to upload and retrieve data from an external source. In our project, the external source is the AWS S3.
When downloading the files we create a read stream from the AWS SDK getObject method and pipe it to a writeStream which will close automatically once all the data is written. The advantage of using streams is the memory consumption, where the application doesn’t care about the size of the file downloaded.
Our code inside storage moduleas shown below uses streams to asynchronously download the data without blocking the event loop.
Node.js streams with AWS getObject
To dig deeper into Node.js streams, please refer to this write up here.
This is the most straight forward topic in the whole application, where you install the AWS-SDK and start accessing the methods in it. Taking a look at the storage file would give you a better understanding of how to import and call methods on the same.
Here you can find the entire code for this application, more than reading this, hands-on would give a great deal of information and help you understand the core concepts of this application. Feel free to fork it, play with it and if you like it leave a star on the repo.
This marks the end of this write-up, hope it gave a better understanding of how to plan, build and run a Node.js service in the real-time on a platform such as AWS. Feel free to leave your comments, responses and not to mention claps on the story if you liked it. Thanks for reading and feel free to comment!
Originally published by Rajesh Babu at https://blog.bitsrc.io
We are providing robust Node.JS Development Services with expert Node.js Developers. Get affordable Node.JS Web Development services from Skenix Infotech.
A Guide to Hire Node.js Developers who can help you create fast and efficient web applications. Also, know how much does it cost to hire Node.js Developers.
A thoroughly researched list of top NodeJS development companies with ratings & reviews to help hire the best Node.JS developers who provide development services and solutions across the world. List of Leading Node.js development Service Providers...
Looking to Hire Professional AWS Developers? The technology inventions have demanded all businesses to use and manage cloud-based computing services and Amazon is dominating the cloud computing services provider in the world. **[Hire AWS...
Want to Hire AWS Developer for cloud computing services? At **[HourlyDeveloper.io](https://hourlydeveloper.io/ "HourlyDeveloper.io")**, we leverage maximum benefits from the AWS platform ensuring prominent Solutions for business requirements....