What is NoSQL? The 4 Best NoSQL Databases Explained

What is NoSQL? The 4 Best NoSQL Databases Explained | Apache Cassandra, HBase, MongoDB, Neo4j

Traditionally, the Structured Query Language or SQL have been the most popular and common type of databases. They rose to popularity in the 70’s, at a time when storage was extremely expensive – but then again, so were computers. Software engineers needed a way to normalize their databases to reduce data duplication and more efficiently use what little storage they had.

Eventually, technology outgrew the SQL database. A new type, dubbed NoSQL, was born. Now, this term has a double meaning, either “non-SQL” or “not only SQL” – subtly different. But either way, NoSQL databases store data in a format other than relational tables so both terms fall under the same umbrella.

The NoSQL database made an appearance as the cost of data storage per megabyte started to plummet. Technology was advancing, drives were increasing in capacity and also dropping in price. There was a shift – the primary cost of software development wasn’t storage anymore; it was the developers themselves. This shift filtered down into the way databases work, going from being focused on reducing data duplication to a better model to optimize developer productivity – hence why it’s used today.

SUBSCRIBE to Kofi Group:https://www.youtube.com/channel/UC1mBXiJnLtiDHMtpga0Ugaw?view_as=subscriber

00:00 - Intro (what is NoSQL)
02:38 - SQL vs NoSQL
03:56 - Why NoSQL
05:39 - Apache Cassandra / CassDB
06:45 - Apache HBase
08:16 - MongoDB
09:27 - Neo4j

Blog article version: https://www.kofi-group.com/what-is-nosql-the-4-best-nosql-databases-explained/

Remote jobs: https://www.kofi-group.com/search-jobs/

Kofi Group helps startups outcompete FAANG (Facebook, Amazon, Apple, Netflix, Google) and big tech in the highly competitive, war for talent.

Our videos cover hiring tips and strategies for startups, software engineering and machine learning interview preparation, salary negotiation best practices, compensation analysis, computer science basics, artificial intelligence, tips for other recruiters, and much more!

Hit the SUBSCRIBE button and we’ll see you in the comments!

Music - Throwaway 2 by XIAO-NIAO

#nosql #cassdb #hbase #mongodb #neo4j #kofigroup #startup #faang

#nosql #apache cassandra #hbase #mongodb #neo4j

What is NoSQL? The 4 Best NoSQL Databases Explained
Edureka Fan

Edureka Fan


HBase Tutorial for Beginners | Introduction to Apache HBase | Hadoop Training

This Edureka video on “HBaseTutorial” will provide you with detailed knowledge about HBase and the functionalities it can perform.

  • Why we needed HBase?
  • What is HBase?
  • Difference between HBase and HDFS
  • HBase Storage
  • Features of HBase
  • HBase Architecture
  • HBase Demo

#hbase #apache #developer

HBase Tutorial for Beginners | Introduction to Apache HBase | Hadoop Training
Lenora  Hauck

Lenora Hauck


What Is HBase in Hadoop NoSQL?

HBase is a column-oriented data store that sits on top of the Hadoop Distributed File System and provides random data lookup and updates for big data consultants. Hadoop Distributed File System is based on “Write Once Read Many” architecture which means that files once written to HDFS storage layer cannot be modified but only be read any number of times. However, HBase provides a schema on top of the HDFS files to access and update these files any number of times.

HBase Characteristics

Strong Consistency

HBase provides strong consistency for both Read/Write which means you will always get the latest data in a read operation and also write operation will not be completed unless all the replicas have been updated.

Horizontally scalable

HBase provides automatic sharding using the concepts of regions, which are distributed over the cluster. Whenever the table size becomes too large to accommodate the data, it is auto sharded and distributed among multiple machines.


HBase provides automatic region failover in case of failures.

HDFS/MapReduce Integration

HBase is based on top of HDFS and can be integrated with MapReduce programs to act as a source and sinks.

#big data #hbase #shell commands

What Is HBase in Hadoop NoSQL?
Sierra  Grimes

Sierra Grimes


Counting Large Set of Unstructured Events with Theta Sketches

Key Takeaways

  • Model dynamic multi-dimensional data as Theta Sketches in such a way as to allow for millisecond-latency queries.
  • Downstream pipelines of services consume the user activity events both directly from Kafka as well as from an Amazon S3 raw data lake which stores the data in Parquet files.
  • HBase NoSQL database unique features were utilized to solve this problem.
  • Apache Spark is used to read the events from the data lake and pre-aggregate them into HBase.

AppsFlyer is a commercial SaaS attribution platform. Its clients, some of the largest mobile app companies in the world, send a large amount of events daily made up of the installs, uninstalls, sessions, in-app events, clicks and impressions performed by their user base.

In this article, I will discuss a system AppsFlyer built for the purpose of quickly and accurately finding the approximate sizes of sets of unique users (represented by a non-PII user ID), segmented by any combination of criteria over the various dimensions of these events. This system (later referred to as “Audiences”) is used by AppsFlyer’s user segmentation product for supplying interactive feedback to its users while they are defining criteria in the UI. Every action in the UI queries this system to find the approximate size of a unique set of users which meet the criteria, allowing users to fine-tune their criteria until they reach a number that they are happy with.

As a brief example, advertisers of an e-commerce application might want to know how many of their unique users installed the app in the last month, and also purchased products A and B, but DID NOT purchase product C; or how many unique users in the US added more than X products to their shopping cart in the past week but never checked out.


Learning from Failures: Early-Days Microservices Observability at Google (Live Webinar, July 30th, 2020) – Save Your Seat
Free Product Owner Learning Path
Radically Collaborative Patterns for Software Makers
Global Multi-site Clustering without Tradeoffs
From Docker to Kubernetes: Container Networking 101 (By O’Reilly)


**NGINX Plus is the complete application delivery platform for the modern web. **Start your 30 day free trial.

One of the challenges faced was that the events that reach AppsFlyer are schemaless: AppsFlyer clients are free to send any number of dimensions (i.e, “product_name” or “level_completed_num”) as part of the payload of their events. This leads to a very high number of different dimensions the multi-tenant system would need to make sense of.

This article will discuss how this system was designed and engineered to provide this approximation, with the following considerations in mind:

  • Latency: every user action in the browser should update the number in sub-second latency.
  • Accuracy: to provide a user with an estimated number that is accurate enough to confidently use.
  • Multi-tenancy: the system would need to host and serve data across all of AppsFlyer’s users, requiring it to tackle the open-ended dimensional cardinality that the data inherently contains.

The core technologies used to build this system are Theta Sketches and HBase, both of which will be discussed with an overview of how they fit into the system’s architecture, and why they fit the specific problem at hand.

#hbase #nosql #article #apache

Counting Large Set of Unstructured Events with Theta Sketches