This blog will help to design architecture of your big data projects in cloud environment. Contents is for new projects (development from scratch) , discussed types of data, explanation of tools, use cases, solution explanation,etc.,.
This blog discuss implementation of Big Data in Cloud Platform, overview of Hadoop & Spark,its use cases, storing, processing, analysis, visualisation, etc., in production systems
Aim of this blog is to understand the Hadoop and Spark technologies (overview) along with use cases, execution in production systems. This blog will help to design architecture of your big data projects in cloud environment. Contents is for new projects (development from scratch) , discussed types of data, explanation of tools, use cases, solution explanation,etc.,.
Contents in this Blog:
Here I cover Hadoop and Spark platforms, Pros & cons of Hadoop, use cases might support ELT /ETL, Hive:Partitioning, Data warehouses, SQL Queries, NOSQL DBs, Hadoop Distributions,Streaming processing, Drill (Query Engine) use case, execution in AWS. I covered 5 use cases which are ELT Processing in Hive, Data Ingestion using SQOOP, NOSQL/SQL ‘s Data processing using connectors and Spark SQL, Stream Processing — Log Processing and Drill use case.
Based upon use cases and large scale of data we need to decide whether we can use big data or not.
I cover 2 use cases in Hadoop , 2 use cases in Spark and 1 in Drill .
By looking at the title diagram you will get glance about Hadoop in AWS.
Following steps to perform big data project in cloud computing.
You have to read this blog from beginning to end then you will get a clear picture of using BIG DATA platform in cloud environment and it might help to consider for your requirement or use cases.
Let me start this blog by introducing Data Sources.
There might be different architectures of your applications , but this is the common architecture used for Hadoop, Spark, NOSQL platforms.
Schedulers: Schedulers run periodically to get the data from different Data-Sources and move the compressed data (example: zip format) to AWS-S3.
Data Gathering: Data can be gathered from different sources like Flat file, Data bases, NOSQLs, DWHs,Web & Social Media( by writing Scrapers to extract the data) and primarily stored in Data Lake like AWS-S3 and then move to HDFS.
Once the data is moved to HDFS ( this is the primary storage) and story begins for processing Big Data.
Before processing the data we need to understand the type of data we are going to work on. Usually there are 3 different types of Data:
1) Structured: Data generated from RDBMS and DWHs( it contains the structure along with data)
2) Semi-Structured: Data generated from NOSQLs , XML, JSON files. (Mixes with semi-structure along with Data)
3) Unstructured : There is no structure in the data. For ex: CSV, TXT, Images, Videos etc.,
Once we understand the Data, we have to select the tools to process. We will start by understanding the tools in Hadoop and Spark platforms.
Let’s start Hadoop framework and its eco systems.
‘Data is the new science. Big Data holds the key answers’ - Pat Gelsinger The biggest advantage that the enhancement of modern technology has brought
Big Data Hadoop online training Course is one of the best options for the capable and qualified big data experts to boost their career in the industry.
We need no rocket science in understanding that every business, irrespective of their size in the modern-day business world, needs data insights for its expansion. Big data analytics is essential when it comes to understanding the needs and wants of a significant section of the audience.
Big Data Analytics is the next big thing in business, and it is a reality that is slowly dawning amongst companies. With this article, we have tried to show you the importance of Big Data in business and urge you to take advantage of this immense...
It provides huge storage for any form of facts, enormous processing power and the capacity to deal with without a doubt countless concurrent duties or jobs.