Upsert: Upsert on MySQL, PostgreSQL, and SQLite3.

Upsert

Make it easy to upsert on traditional RDBMS like MySQL, PostgreSQL, and SQLite3—hey look NoSQL!. Transparently creates (and re-uses) stored procedures/functions when necessary.

You pass it a bare-metal connection to the database like Mysql2::Client (from mysql2 gem on MRI) or Java::OrgPostgresqlJdbc4::Jdbc4Connection (from jdbc-postgres on Jruby).

As databases start to natively support SQL MERGE (which is basically upsert), this library will take advantage (but you won't have to change your code).

Does not depend on ActiveRecord.

Does not use INSERT ON DUPLICATE KEY UPDATE on MySQL as this only works if you are very careful about creating unique indexes.

70–90%+ faster than emulating upsert with ActiveRecord.

Supports MRI and JRuby.

Usage

You pass a selector that uniquely identifies a row, whether it exists or not. You also pass a setter, attributes that should be set on that row.

Syntax inspired by mongo-ruby-driver's update method.

Basic

connection = Mysql2::Client.new([...])
table_name = :pets
upsert = Upsert.new connection, table_name
# N times...
upsert.row({:name => 'Jerry'}, :breed => 'beagle', :created_at => Time.now)

The created_at and created_on columns are used for inserts, but ignored on updates.

So just to reiterate you've got a selector and a setter:

selector = { :name => 'Jerry' }
setter = { :breed => 'beagle' }
upsert.row(selector, setter)

Batch mode

By organizing your upserts into a batch, we can do work behind the scenes to make them faster.

connection = Mysql2::Client.new([...])
Upsert.batch(connection, :pets) do |upsert|
  # N times...
  upsert.row({:name => 'Jerry'}, :breed => 'beagle')
  upsert.row({:name => 'Pierre'}, :breed => 'tabby')
end

Batch mode is tested to be about 80% faster on PostgreSQL, MySQL, and SQLite3 than other ways to emulate upsert (see the tests, which fail if they are not faster).

Native Postgres upsert

INSERT ... ON CONFLICT DO UPDATE is used when Postgres 9.5+ is detected and unique constraint are in place.

**Note: ** You must have a unique constraint on the column(s) you're using as a selector. A unique index won't work. See https://github.com/seamusabshere/upsert/issues/98#issuecomment-295341405 for more information and some ways to check.

If you don't have unique constraints, it will fall back to the classic Upsert gem user-defined function, which does not require a constraint.

ActiveRecord helper method

require 'upsert/active_record_upsert'
# N times...
Pet.upsert({:name => 'Jerry'}, :breed => 'beagle')

Wishlist

Pull requests for any of these would be greatly appreciated:

  1. Cache JDBC PreparedStatement objects.
  2. Sanity check my three benchmarks (four if you include activerecord-import on MySQL). Do they accurately represent optimized alternatives?
  3. Provide require 'upsert/debug' that will make sure you are selecting on columns that have unique indexes
  4. Test that Upsert instances accept arbitrary columns, even within a batch, which is what people probably expect.
  5. @antage's idea for "true" upserting: (from https://github.com/seamusabshere/upsert/issues/17)
selector = { id: 15 }
update_setter = { count: Upsert.sql('count + 1') }
insert_setter = { count: 1 }
upsert.row_with_two_setter(update_setter, insert_setter, selector)

Real-world usage

Faraday logo

We use upsert for big data at Faraday. Originally written to speed up the data_miner data mining library.

Supported databases/drivers

*MySQLPostgreSQLSQLite3
MRImysql2pgsqlite3
JRubyjdbc-mysqljdbc-postgresjdbc-sqlite3

See below for details about what SQL MERGE trick (emulation of upsert) is used, performance, code examples, etc.

Rails / ActiveRecord

(Assuming that one of the other three supported drivers is being used under the covers).

  • add "upsert" to your Gemfile and
  • run bundle install
Upsert.new Pet.connection, Pet.table_name

Speed

Depends on the driver being used!

SQL MERGE trick

Depends on the driver being used!

MySQL

On MRI, use the mysql2 driver.

require 'mysql2'
connection = Mysql2::Connection.new(:username => 'root', :password => 'password', :database => 'upsert_test')
table_name = :pets
upsert = Upsert.new(connection, table_name)

On JRuby, use the jdbc-mysql driver.

require 'jdbc/mysql'
java.sql.DriverManager.register_driver com.mysql.jdbc.Driver.new
connection = java.sql.DriverManager.get_connection "jdbc:mysql://127.0.0.1/mydatabase?user=root&password=password"

Speed

From the tests (updated 11/7/12):

Upsert was 82% faster than find + new/set/save
Upsert was 85% faster than find_or_create + update_attributes
Upsert was 90% faster than create + rescue/find/update
Upsert was 46% faster than faking upserts with activerecord-import (note: in question as of 3/13/15, need some expert advice)

SQL MERGE trick

Thanks to Dennis Hennen's StackOverflow response!!

CREATE PROCEDURE upsert_pets_SEL_name_A_tag_number_SET_name_A_tag_number(`name_sel` varchar(255), `tag_number_sel` int(11), `name_set` varchar(255), `tag_number_set` int(11))
BEGIN
  DECLARE done BOOLEAN;
  REPEAT
    BEGIN
      -- If there is a unique key constraint error then
      -- someone made a concurrent insert. Reset the sentinel
      -- and try again.
      DECLARE ER_DUP_UNIQUE CONDITION FOR 23000;
      DECLARE ER_INTEG CONDITION FOR 1062;
      DECLARE CONTINUE HANDLER FOR ER_DUP_UNIQUE BEGIN
        SET done = FALSE;
      END;

      DECLARE CONTINUE HANDLER FOR ER_INTEG BEGIN
        SET done = TRUE;
      END;

      SET done = TRUE;
      SELECT COUNT(*) INTO @count FROM `pets` WHERE `name` = `name_sel` AND `tag_number` = `tag_number_sel`;
      -- Race condition here. If a concurrent INSERT is made after
      -- the SELECT but before the INSERT below we'll get a duplicate
      -- key error. But the handler above will take care of that.
      IF @count > 0 THEN
        -- UPDATE table_name SET b = b_SET WHERE a = a_SEL;
        UPDATE `pets` SET `name` = `name_set`, `tag_number` = `tag_number_set` WHERE `name` = `name_sel` AND `tag_number` = `tag_number_sel`;
      ELSE
        -- INSERT INTO table_name (a, b) VALUES (k, data);
        INSERT INTO `pets` (`name`, `tag_number`) VALUES (`name_set`, `tag_number_set`);
      END IF;
    END;
  UNTIL done END REPEAT;
END

PostgreSQL

On MRI, use the pg driver.

require 'pg'
connection = PG.connect(:dbname => 'upsert_test')
table_name = :pets
upsert = Upsert.new(connection, table_name)

On JRuby, use the jdbc-postgres driver.

require 'jdbc/postgres'
java.sql.DriverManager.register_driver org.postgresql.Driver.new
connection = java.sql.DriverManager.get_connection "jdbc:postgresql://127.0.0.1/mydatabase?user=root&password=password"

If you want to use HStore, make the pg-hstore gem available and pass a Hash in setters:

gem 'pg-hstore'
require 'pg_hstore'
upsert.row({:name => 'Bill'}, :mydata => {:a => 1, :b => 2})

PostgreSQL notes

  • Upsert doesn't do any type casting, so if you attempt to do something like the following: upsert.row({ :name => 'A Name' }, :tag_number => 'bob') you'll get an error which reads something like: invalid input syntax for integer: "bob"

Speed

From the tests (updated 9/21/12):

Upsert was 72% faster than find + new/set/save
Upsert was 79% faster than find_or_create + update_attributes
Upsert was 83% faster than create + rescue/find/update
# (can't compare to activerecord-import because you can't fake it on pg)

SQL MERGE trick

Adapted from the canonical PostgreSQL upsert example:

CREATE OR REPLACE FUNCTION upsert_pets_SEL_name_A_tag_number_SET_name_A_tag_number("name_sel" character varying(255), "tag_number_sel" integer, "name_set" character varying(255), "tag_number_set" integer) RETURNS VOID AS
$$
DECLARE
  first_try INTEGER := 1;
BEGIN
  LOOP
    -- first try to update the key
    UPDATE "pets" SET "name" = "name_set", "tag_number" = "tag_number_set"
      WHERE "name" = "name_sel" AND "tag_number" = "tag_number_sel";
    IF found THEN
      RETURN;
    END IF;
    -- not there, so try to insert the key
    -- if someone else inserts the same key concurrently,
    -- we could get a unique-key failure
    BEGIN
      INSERT INTO "pets"("name", "tag_number") VALUES ("name_set", "tag_number_set");
      RETURN;
    EXCEPTION WHEN unique_violation THEN
      -- seamusabshere 9/20/12 only retry once
      IF (first_try = 1) THEN
        first_try := 0;
      ELSE
        RETURN;
      END IF;
      -- Do nothing, and loop to try the UPDATE again.
    END;
  END LOOP;
END;
$$
LANGUAGE plpgsql;

I slightly modified it so that it only retries once - don't want infinite loops.

Sqlite

On MRI, use the sqlite3 driver.

require 'sqlite3'
connection = SQLite3::Database.open(':memory:')
table_name = :pets
upsert = Upsert.new(connection, table_name)

On JRuby, use the jdbc-sqlite3 driver.

# TODO somebody please verify
require 'jdbc/sqlite3'
java.sql.DriverManager.register_driver org.sqlite.Driver.new
connection = java.sql.DriverManager.get_connection "jdbc:sqlite://127.0.0.1/mydatabase?user=root&password=password"

Speed

From the tests (updated 9/21/12):

Upsert was 77% faster than find + new/set/save
Upsert was 80% faster than find_or_create + update_attributes
Upsert was 85% faster than create + rescue/find/update
# (can't compare to activerecord-import because you can't fake it on sqlite3)

SQL MERGE trick

Thanks to @dan04's answer on StackOverflow:

Please note! This will only work properly on Sqlite if one of the columns being used as the "selector" are a primary key or unique index

INSERT OR IGNORE INTO visits VALUES (127.0.0.1, 1);
UPDATE visits SET visits = 1 WHERE ip LIKE 127.0.0.1;

Features

Tested to be fast and portable

In addition to correctness, the library's tests check that it is

  1. Faster than comparable upsert techniques
  2. Compatible with supported databases

Not dependent on ActiveRecord

As below, all you need is a raw database connection like a Mysql2::Connection, PG::Connection or a SQLite3::Database. These are equivalent:

# with activerecord
Upsert.new ActiveRecord::Base.connection, :pets
# with activerecord, prettier
Upsert.new Pet.connection, Pet.table_name
# without activerecord
Upsert.new Mysql2::Connection.new([...]), :pets

For a specific use case, faster and more portable than activerecord-import

You could also use activerecord-import to upsert:

Pet.import columns, all_values, :timestamps => false, :on_duplicate_key_update => columns

activerecord-import, however, only works on MySQL and requires ActiveRecord—and if all you are doing is upserts, upsert is tested to be 40% faster. And you don't have to put all of the rows to be upserted into a single huge array - you can batch them using Upsert.batch.

Gotchas

No automatic typecasting beyond what the adapter/driver provides

We don't have any logic to convert integers into strings, strings into integers, etc. in order to satisfy PostgreSQL/etc.'s strictness on this issue.

So if you try to upsert a blank string ('') into an integer field in PostgreSQL, you will get an error.

Dates and times are converted to UTC

Datetimes are immediately converted to UTC and sent to the database as ISO8601 strings.

If you're using MySQL, make sure server/connection timezone is UTC. If you're using Rails and/or ActiveRecord, you might want to check ActiveRecord::Base.default_timezone... it should probably be :utc.

In general, run some upserts and make sure datetimes get persisted like you expect.

Clearning all library-generated functions

Place the following in to a rake task (so you don't globally redefine the NAME_PREFIX constant)

Upsert::MergeFunction::NAME_PREFIX = "upsert"

# ActiveRecord
Upsert.clear_database_functions(ActiveRecord::Base.connection)

# Sequel
DB.synchronize do |conn|
  Upsert.clear_database_functions(conn)
end

Doesn't work with transactional fixtures

Per https://github.com/seamusabshere/upsert/issues/23 you might have issues if you try to use transactional fixtures and this library.

 

Testmetrics - https://www.testmetrics.app/seamusabshere/upsert

Copyright

Copyright 2013-2019 Seamus Abshere Copyright 2017-2019 Philip Schalm Portions Copyright (c) 2019 The JRuby Team


Author: seamusabshere
Source code: https://github.com/seamusabshere/upsert
License: MIT license

#ruby   #ruby-on-rails 

What is GEEK

Buddha Community

Upsert: Upsert on MySQL, PostgreSQL, and SQLite3.
Joe  Hoppe

Joe Hoppe

1595905879

Best MySQL DigitalOcean Performance – ScaleGrid vs. DigitalOcean Managed Databases

HTML to Markdown

MySQL is the all-time number one open source database in the world, and a staple in RDBMS space. DigitalOcean is quickly building its reputation as the developers cloud by providing an affordable, flexible and easy to use cloud platform for developers to work with. MySQL on DigitalOcean is a natural fit, but what’s the best way to deploy your cloud database? In this post, we are going to compare the top two providers, DigitalOcean Managed Databases for MySQL vs. ScaleGrid MySQL hosting on DigitalOcean.

At a glance – TLDR
ScaleGrid Blog - At a glance overview - 1st pointCompare Throughput
ScaleGrid averages almost 40% higher throughput over DigitalOcean for MySQL, with up to 46% higher throughput in write-intensive workloads. Read now

ScaleGrid Blog - At a glance overview - 2nd pointCompare Latency
On average, ScaleGrid achieves almost 30% lower latency over DigitalOcean for the same deployment configurations. Read now

ScaleGrid Blog - At a glance overview - 3rd pointCompare Pricing
ScaleGrid provides 30% more storage on average vs. DigitalOcean for MySQL at the same affordable price. Read now

MySQL DigitalOcean Performance Benchmark
In this benchmark, we compare equivalent plan sizes between ScaleGrid MySQL on DigitalOcean and DigitalOcean Managed Databases for MySQL. We are going to use a common, popular plan size using the below configurations for this performance benchmark:

Comparison Overview
ScaleGridDigitalOceanInstance TypeMedium: 4 vCPUsMedium: 4 vCPUsMySQL Version8.0.208.0.20RAM8GB8GBSSD140GB115GBDeployment TypeStandaloneStandaloneRegionSF03SF03SupportIncludedBusiness-level support included with account sizes over $500/monthMonthly Price$120$120

As you can see above, ScaleGrid and DigitalOcean offer the same plan configurations across this plan size, apart from SSD where ScaleGrid provides over 20% more storage for the same price.

To ensure the most accurate results in our performance tests, we run the benchmark four times for each comparison to find the average performance across throughput and latency over read-intensive workloads, balanced workloads, and write-intensive workloads.

Throughput
In this benchmark, we measure MySQL throughput in terms of queries per second (QPS) to measure our query efficiency. To quickly summarize the results, we display read-intensive, write-intensive and balanced workload averages below for 150 threads for ScaleGrid vs. DigitalOcean MySQL:

ScaleGrid MySQL vs DigitalOcean Managed Databases - Throughput Performance Graph

For the common 150 thread comparison, ScaleGrid averages almost 40% higher throughput over DigitalOcean for MySQL, with up to 46% higher throughput in write-intensive workloads.

#cloud #database #developer #digital ocean #mysql #performance #scalegrid #95th percentile latency #balanced workloads #developers cloud #digitalocean droplet #digitalocean managed databases #digitalocean performance #digitalocean pricing #higher throughput #latency benchmark #lower latency #mysql benchmark setup #mysql client threads #mysql configuration #mysql digitalocean #mysql latency #mysql on digitalocean #mysql throughput #performance benchmark #queries per second #read-intensive #scalegrid mysql #scalegrid vs. digitalocean #throughput benchmark #write-intensive

ScaleGrid GCP Support Now Available for MySQL, PostgreSQL & Redis™

PALO ALTO, Calif., October 14, 2020 – ScaleGrid, a leading Database-as-a-Service (DBaaS) provider, has just announced the launch of Google Cloud Platform (GCP) hosting through their fully managed DBaaS plans. In addition to their AWS, Azure and DigitalOcean hosting solutions, ScaleGrid will now offer GCP hosting for MySQL, PostgreSQL and Redis™.

Google Cloud Platform is the second most popular cloud provider for open source database hosting according to the 2019 Open Source Database Report. While GCP offers their own database products, such as Cloud SQL which can be used for MySQL or PostgreSQL, many users prefer to leverage the open source databases to avoid cloud vendor lock-in with a single provider.

ScaleGrid GCP plans are fully managed and hosted through the DBaaS provider through their standard Dedicated Hosting plans, but they also offer a unique Bring Your Own Cloud (BYOC) model that allows you to host your databases through your own cloud account. All of ScaleGrid’s cloud solutions include advanced configuration and control options, including full superuser access, custom replica setups, and the ability to leverage any instance type. These are in addition to their automation tools that allow you to deploy, monitor, backup and scale your deployments through a few simple clicks.

“We are seeing increasing demand from our customer base for managed Postgresql, MySQL & Redis solutions on Google Cloud” says Dharshan Rangegowda, CEO and Founder of ScaleGrid. “ScaleGrid DBaaS platform provides customers several unique advantages and we are delighted to bring these options to the GCP platform.”.

To learn more about how ScaleGrid compares to GCP, check out their MySQLPostgreSQL and Redis™ vs. GCP’s database products.

#database #google cloud #mysql #postgresql #redis #scalegrid #cloud provider #gcp #google cloud platform #managed database #mysql #postgresql google cloud #scalegrid gcp

Loma  Baumbach

Loma Baumbach

1595774031

ScaleGrid DigitalOcean Support for MySQL, PostgreSQL and Redis™

PALO ALTO, Calif., June 9, 2020 – ScaleGrid, a leading Database-as-a-Service (DBaaS) provider, has just announced support for their MySQLPostgreSQL and Redis™ solutions on DigitalOcean. This launch is in addition to their current DigitalOcean offering for MongoDB® database, the only DBaaS to support this database on DigitalOcean.

MySQL and PostgreSQL are the top two open source relational databases in the world, and Redis is the top key-value database. These databases are a natural fit for the developer market that has gravitated towards DigitalOcean since its launch just nine years ago in 2011. The open source model is not only popular with the developer market, but also enterprise companies looking to modernize their infrastructure and reduce spend.  DigitalOcean instance costs are also over 28% less expensive than AWS, and over 26% less than Azure, providing significant savings for companies who are struggling in this global climate.

ScaleGrid’s MySQL, PostgreSQL and Redis™ solutions on DigitalOcean are competitively priced starting at just $15/GB, the same as DigitalOcean’s Managed Database solution, but offer on average 30% more storage for the same price. Additionally, ScaleGrid offers several competitive advantages such as full superuser access, custom master-slave configurations, and advanced slow query analysis and monitoring capabilities through their sophisticated platform. To compare more features, check out their ScaleGrid vs. DigitalOcean MySQLScaleGrid vs. DigitalOcean PostgreSQL and ScaleGrid vs. DigitalOcean Redis™ pages.

#cloud #database #developer #digital ocean #mysql #postgresql #redis #scalegrid #advanced performance #database infrastructure #dbaas on digitalocean #digitalocean customers #digitalocean instance costs #digitalocean managed databases #high performance ssd #mysql digitalocean #postgresql digitalocean #redis digitalocean #scalegrid digitalocean #scalegrid vs. digitalocean

Devyn  Reilly

Devyn Reilly

1621520338

MySQL vs. PostgreSQL

Comparing the Relational Database Management System to the Object-Relational Database Management System

A little bit ago I compared MySQL to SQLite. It was both something I enjoyed writing and something I found interesting. I wanted to carry forward this and use it as a deeper dive into other Database Management Systems. I decided to learn more about Postgres and thought the comparison would help to clear up areas of confusion if any were to be found. Also, by comparing there seems to be a deeper dive into aspects other than just the syntax.

Let’s start looking at what Postgres is and how it differs from MySQL without any further delay.

#mysql #postgresql #dbms #mysql vs. postgresql

Loma  Baumbach

Loma Baumbach

1595781840

Exploring MySQL Binlog Server - Ripple

MySQL does not limit the number of slaves that you can connect to the master server in a replication topology. However, as the number of slaves increases, they will have a toll on the master resources because the binary logs will need to be served to different slaves working at different speeds. If the data churn on the master is high, the serving of binary logs alone could saturate the network interface of the master.

A classic solution for this problem is to deploy a binlog server – an intermediate proxy server that sits between the master and its slaves. The binlog server is set up as a slave to the master, and in turn, acts as a master to the original set of slaves. It receives binary log events from the master, does not apply these events, but serves them to all the other slaves. This way, the load on the master is tremendously reduced, and at the same time, the binlog server serves the binlogs more efficiently to slaves since it does not have to do any other database server processing.

MySQL Binlog Server Deployment Diagram - ScaleGrid Blog

Ripple is an open source binlog server developed by Pavel Ivanov. A blog post from Percona, titled MySQL Ripple: The First Impression of a MySQL Binlog Server, gives a very good introduction to deploying and using Ripple. I had an opportunity to explore Ripple in some more detail and wanted to share my observations through this post.

1. Support for GTID based replication

Ripple supports only GTID mode, and not file and position-based replication. If your master is running in non-GTID mode, you will get this error from Ripple:

Failed to read packet: Got error reading packet from server: The replication sender thread cannot start in AUTO_POSITION mode: this server has GTID_MODE = OFF instead of ON.

You can specify Server_id and UUID for the ripple server using the cmd line options: -ripple_server_id and -ripple_server_uuid

Both are optional parameters, and if not specified, Ripple will use the default server_id=112211 and uuid will be auto generated.

2. Connecting to the master using replication user and password

While connecting to the master, you can specify the replication user and password using the command line options:

-ripple_master_user and -ripple_master_password

3. Connection endpoint for the Ripple server

You can use the command line options -ripple_server_ports and -ripple_server_address to specify the connection end points for the Ripple server. Ensure to specify the network accessible hostname or IP address of your Ripple server as the -rippple_server_address. Otherwise, by default, Ripple will bind to localhost and hence you will not be able to connect to it remotely.

4. Setting up slaves to the Ripple server

You can use the CHANGE MASTER TO command to connect your slaves to replicate from the Ripple server.

To ensure that Ripple can authenticate the password that you use to connect to it, you need to start Ripple by specifying the option -ripple_server_password_hash

For example, if you start the ripple server with the command:

rippled -ripple_datadir=./binlog_server -ripple_master_address= <master ip> -ripple_master_port=3306 -ripple_master_user=repl -ripple_master_password='password' -ripple_server_ports=15000 -ripple_server_address='172.31.23.201' -ripple_server_password_hash='EF8C75CB6E99A0732D2DE207DAEF65D555BDFB8E'

you can use the following CHANGE MASTER TO command to connect from the slave:

CHANGE MASTER TO master_host='172.31.23.201', master_port=15000, master_password=’XpKWeZRNH5#satCI’, master_user=’rep’

Note that the password hash specified for the Ripple server corresponds to the text password used in the CHANGE MASTER TO command. Currently, Ripple does not authenticate based on the usernames and accepts any non-empty username as long as the password matches.

Exploring MySQL Binlog Server - Ripple

CLICK TO TWEET

5. Ripple server management

It’s possible to monitor and manage the Ripple server using the MySQL protocol from any standard MySQL client. There are a limited set of commands that are supported which you can see directly in the source code on the mysql-ripple GitHub page.

Some of the useful commands are:

  • SELECT @@global.gtid_executed; – To see the GTID SET of the Ripple server based on its downloaded binary logs.
  • STOP SLAVE; – To disconnect the Ripple server from the master.
  • START SLAVE; – To connect the Ripple server to the master.

#cloud #database #developer #high availability #mysql #performance #binary logs #gtid replication #mysql binlog #mysql protocol #mysql ripple #mysql server #parallel threads #proxy server #replication topology #ripple server