Apache Hadoop on Kubernetes and Hadoop Security

It is a platform or framework in which Big Data is stored in Distributed Environment and processing of this data is done parallelly. The capacity of Hadoop to handle structured and unstructured data provides users more flexibility for collecting, processing and analyzing data than that of provided by relational databases or warehouses.The Apache Hadoop is so widespread that it is adopted by many famous companies like Facebook, Yahoo, DataDog, Netflix, Adobe, etc. Apache Hadoop Security mainly consists of two components –

HDFS
YARN
Place where all the data is stored in a cluster is HDFS (Hadoop Distributed File System). In general, we can see that HDFS is a single unit that is storing for storing all the Big Data, but in reality, all the data is distributed across multiple nodes in a distributed fashion. Master- HDFS follows slave architecture.

There are two nodes used in HDFS, i.e. Namenode and Datanode. Namenode act as Master and Datanode is Slave. In can general we say that the Master node contains metadata or information regarding which kind of data is stored at which node while on the Another way actual data is stored in the Datanode. As we all know, failure rates of the hardware are pretty high so keeping this in mind we have to replicate the data in all the data nodes that can be used in case of emergency. You can also explore more about Apache Hadoop 3.0 in this insight.

Now the data that is stored in HDFS is needed to be processed and necessary information is driven so this is done by the help of YARN (yet another resource, negotiator). YARN allocates resources and Schedule tasks to process this activity. YARN having its major components i.e. Resource Manager and Node Manager. Resource Manager is Master node that receives the request, and then he transfers this request to node Manager accordingly where there is actual processing takes place.

#apache hadoop #kubernetes #hadoop security

xenonstack.com

Apache Hadoop on Kubernetes and Hadoop Security