Introduction to Big Data and the different techniques employed to handle it such as MapReduce, Apache Spark and Hadoop.
According to Forbes, about 2.5 quintillion bytes of data is generated every day. Nonetheless, this number is just projected to constantly increase in the following years (90% of nowadays stored data has been produced within the last two years) [1].
What makes Big Data different from any other large amount of data stored in relational databases is its heterogeneity. The data comes from different sources and has been recorded using different formats.
Three different ways of formatting data are commonly employed:
Big Data is defined by three properties:
Big Data can be analysed using two different processing techniques:
Big Data can be processed using different tools such as MapReduce, Spark, Hadoop, Pig, Hive, Cassandra and Kafka. Each of these different tools has its advantages and disadvantages which determines how companies might decide to employ them [2].
Big Data Analysis is now commonly used by many companies to predict market trends, personalise customers experiences, speed up companies workflow, etc…
#big data & cloud #artificial intelligence #big data #deep learning #hadoop #mapreduce #spark