1602525600
In Apache Spark/PySpark we use abstractions and the actual processing is done only when we want to materialize the result of the operation. To connect to different databases and file systems we use mostly ready-made libraries. In this story you will learn how to combine data with MySQL and MongoDB and then save it in Apache Cassandra.
The ideal moment to use Docker, or more precisely Docker Compose. We will run all databases and Jupyter with Apache Spark.
## Use root/example as user/password credentials
version: '3.1'
services:
notebook:
image: jupyter/all-spark-notebook
ports:
- 8888:8888
- 4040:4040
volumes:
- ./work:/home/jovyan/work
cassandra:
image: 'bitnami/cassandra:latest'
mongo:
image: mongo
environment:
MONGO_INITDB_ROOT_USERNAME: root
MONGO_INITDB_ROOT_PASSWORD: example
mysql:
image: mysql:5.7
environment:
MYSQL_DATABASE: 'school'
MYSQL_USER: 'user'
MYSQL_PASSWORD: 'password'
MYSQL_ROOT_PASSWORD: 'password'
We need some data. I wrote a simple script in Python. Let’s assume that there are students’ data in Mongo.
#etl #mongodb #spark #pyspark #cassandra
1595905879
HTML to Markdown
MySQL is the all-time number one open source database in the world, and a staple in RDBMS space. DigitalOcean is quickly building its reputation as the developers cloud by providing an affordable, flexible and easy to use cloud platform for developers to work with. MySQL on DigitalOcean is a natural fit, but what’s the best way to deploy your cloud database? In this post, we are going to compare the top two providers, DigitalOcean Managed Databases for MySQL vs. ScaleGrid MySQL hosting on DigitalOcean.
At a glance – TLDR
ScaleGrid Blog - At a glance overview - 1st pointCompare Throughput
ScaleGrid averages almost 40% higher throughput over DigitalOcean for MySQL, with up to 46% higher throughput in write-intensive workloads. Read now
ScaleGrid Blog - At a glance overview - 2nd pointCompare Latency
On average, ScaleGrid achieves almost 30% lower latency over DigitalOcean for the same deployment configurations. Read now
ScaleGrid Blog - At a glance overview - 3rd pointCompare Pricing
ScaleGrid provides 30% more storage on average vs. DigitalOcean for MySQL at the same affordable price. Read now
MySQL DigitalOcean Performance Benchmark
In this benchmark, we compare equivalent plan sizes between ScaleGrid MySQL on DigitalOcean and DigitalOcean Managed Databases for MySQL. We are going to use a common, popular plan size using the below configurations for this performance benchmark:
Comparison Overview
ScaleGridDigitalOceanInstance TypeMedium: 4 vCPUsMedium: 4 vCPUsMySQL Version8.0.208.0.20RAM8GB8GBSSD140GB115GBDeployment TypeStandaloneStandaloneRegionSF03SF03SupportIncludedBusiness-level support included with account sizes over $500/monthMonthly Price$120$120
As you can see above, ScaleGrid and DigitalOcean offer the same plan configurations across this plan size, apart from SSD where ScaleGrid provides over 20% more storage for the same price.
To ensure the most accurate results in our performance tests, we run the benchmark four times for each comparison to find the average performance across throughput and latency over read-intensive workloads, balanced workloads, and write-intensive workloads.
Throughput
In this benchmark, we measure MySQL throughput in terms of queries per second (QPS) to measure our query efficiency. To quickly summarize the results, we display read-intensive, write-intensive and balanced workload averages below for 150 threads for ScaleGrid vs. DigitalOcean MySQL:
ScaleGrid MySQL vs DigitalOcean Managed Databases - Throughput Performance Graph
For the common 150 thread comparison, ScaleGrid averages almost 40% higher throughput over DigitalOcean for MySQL, with up to 46% higher throughput in write-intensive workloads.
#cloud #database #developer #digital ocean #mysql #performance #scalegrid #95th percentile latency #balanced workloads #developers cloud #digitalocean droplet #digitalocean managed databases #digitalocean performance #digitalocean pricing #higher throughput #latency benchmark #lower latency #mysql benchmark setup #mysql client threads #mysql configuration #mysql digitalocean #mysql latency #mysql on digitalocean #mysql throughput #performance benchmark #queries per second #read-intensive #scalegrid mysql #scalegrid vs. digitalocean #throughput benchmark #write-intensive
1602525600
In Apache Spark/PySpark we use abstractions and the actual processing is done only when we want to materialize the result of the operation. To connect to different databases and file systems we use mostly ready-made libraries. In this story you will learn how to combine data with MySQL and MongoDB and then save it in Apache Cassandra.
The ideal moment to use Docker, or more precisely Docker Compose. We will run all databases and Jupyter with Apache Spark.
## Use root/example as user/password credentials
version: '3.1'
services:
notebook:
image: jupyter/all-spark-notebook
ports:
- 8888:8888
- 4040:4040
volumes:
- ./work:/home/jovyan/work
cassandra:
image: 'bitnami/cassandra:latest'
mongo:
image: mongo
environment:
MONGO_INITDB_ROOT_USERNAME: root
MONGO_INITDB_ROOT_PASSWORD: example
mysql:
image: mysql:5.7
environment:
MYSQL_DATABASE: 'school'
MYSQL_USER: 'user'
MYSQL_PASSWORD: 'password'
MYSQL_ROOT_PASSWORD: 'password'
We need some data. I wrote a simple script in Python. Let’s assume that there are students’ data in Mongo.
#etl #mongodb #spark #pyspark #cassandra
1608388622
#mongodb tutorial #mongodb tutorial for beginners #mongodb database #mongodb with c# #mongodb with asp.net core #mongodb
1608388501
#MongoDB
#Aspdotnetexplorer
#mongodb #mongodb database #mongodb with c# #mongodb with asp.net core #mongodb tutorial for beginners #mongodb tutorial
1620626280
Today, we generate unprecedented volumes of data, precisely over 2.5 quintillion bytes of data every day! With each passing day, this number is only going to increase. However, the data we produce is generally raw and unstructured – it is a compilation of unorganized, random facts that lack coherence and meaning. Thus, it is essential to clean, organize, process, analyze, and contextualize the data to convert into meaningful information. This is where databases and database management systems (DBMS) enter the picture.
There are primarily two types of databases that act as a base for the many different databases we have now. They are SQL and NoSQL. Both of them are opposite binaries. Primarily, SQL served as the foundation for relational databases. Although SQL dominated the database domain for a very long time, the steady upsurge in data over the years created a need for a DBMS that can scale exponentially. This need resulted in the birth of the NoSQL database.
#mongodb #mysql #mysql vs mongodb