Tips for Migrating from MySQL to Cloud Spanner

Tips for Migrating from MySQL to Cloud Spanner

In this tutorial, you'll see a few tips to pay attention to when migrating from MySQL to Cloud Spanner, and how to update your application logic to address them. Cloud spanner enforces strict data types. Using sequences in primary keys is not a Spanner best practice. Implicit casting and the need for explicit casting of field data types. Using interleaved tables to improve read performance

Since its initial release in 1995, MySQL has not only been the de-facto database for many data storage needs, but has gathered much attention over the years as one of the most well-recognized databases around when it comes to Relational Database Management Systems and transactional data processing.

Any application in retail, e-commerce, or banking has likely had its fair share of business depending on a relational database for its transactional needs. Many of these applications are built on MySQL due to its flexibility, open source nature, and strong community support.

Whether in the context of a migration from a relational database (like MySQL or PostgreSQL) , migration from a NoSQL database (like Cassandra), or a green grass workload, many companies turn to Cloud Spanner seeking a high availability SLA (99.99% for regional instances and 99.999% for multi region instances), unlimited scale, and low operational overhead - no patching required, no maintenance or other planned downtimes, just to name a few benefits.

The tooling and open source ecosystem around Spanner has evolved and grown ever since the service was introduced. HarbourBridge is a part of this ecosystem, and it’s meant to help customers port their existing MySQL or PostgreSQL schema to a Cloud Spanner schema. 

Application Migration Tips

Despite helpful tools like HarbourBridge, database migrations are never trivial. Here are a few things to pay attention to when migrating from MySQL to Spanner, and how to update your application logic to address them.

Note - The following snippets use the PHP client for Cloud Spanner. A couple of the snippets reference a partial Magento port that our friends over at Searce have been working on. Once you understand the operations, you should be able to implement the same in any of the other languages that Cloud Spanner supports.

Cloud spanner enforces strict data types  

In a MySQL query, the value of an attribute can be referenced as either a string or an integer. Example:

  • select * from catalog_eav_attribute where attribute_id = 46;
  • select * from catalog_eav_attribute where attribute_id = "46";

Both are valid and equivalent. 

In Cloud Spanner, the query will return an error if you try to reference an integer type by using a string representation.

Here is an example of a working query from Cloud Spanner:

  • select * from catalog_eav_attribute where attribute_id = 46; -- this will work
  • Select * from catalog_eav_attribute where attribute_id = “46” -- will fail, since we are supplying a string with “46” 

You may use a function like the following to assist you with such transformations.


/**
    * Formats the SQL for Cloud Spanner
    * Example
    * Input SQL : <select statement> WHERE 
    *    (`product_id` = '340') ORDER BY position  ASC
    * Output SQL : <select statement> WHERE 
    *    (`product_id` = 340) ORDER BY position  ASC
    * In the above example integer 
    *`340` is sanitized by removing single quotes.
    * Sanitization is required since Cloud Spanner
    * has strict typing
    * @param string $sql
    * @return string $sql
    */
   public function sanitizeSql(string $sql)
   {
       if (preg_match_all("/('[^']*')/", $sql, $m)) {
           $matches = array_shift($m);
           for($i = 0; $i < count($matches); $i++) {
               $curr =  $matches[$i];
               $curr = filter_var($curr, 
                   FILTER_SANITIZE_NUMBER_INT);
               if (is_numeric($curr)) {
                   $sql = str_replace($matches[$i], 
                       $curr, $sql);
               }
           }
       }
       return $sql;
   }

Using sequences in primary keys is not a Spanner best practice

Cloud Spanner does not implement a sequence generator, and it is not a Spanner best practice to use sequential IDs because doing so can cause hotspotting.

An alternative mechanism of generating a unique primary key is to use a UUID or any other similar mechanisms that result in non-sequential values. For more information, please refer to this article.

In order to convert all existing primary keys to a UUID pattern, you can change the schema using the snippet here

You may use this code snippet to modify the application code to generate the UUID for the auto increment. There are other ways to do this as well, such as using PHP built-in uniqid function.


/**
    * Generate UUID.
    * @return string
    */
   public function getAutoIncrement()
   {
       if (function_exists('com_create_guid') === true) {
           return trim(com_create_guid(), '{}');
       }

       return sprintf('%04X%04X-%04X-%04X-%04X-%04X%04X%04X',
           mt_rand(0, 65535), mt_rand(0, 65535),
           mt_rand(0, 65535), mt_rand(16384, 20479), 
           mt_rand(32768, 49151),  mt_rand(0, 65535),
           mt_rand(0, 65535), mt_rand(0, 65535));   
   }

Implicit casting and the need for explicit casting of field data types 

Both MySQL and Cloud Spanner follow the SQL standard, hence much of the query syntax is the same. One notable difference is that the Cloud Spanner field types are implicitly cast to an appropriate data type out of the ones mentioned here. When implicit casting of the field type fails, Spanner returns a read error, so it would be safer to perform the casting to the appropriate type when issuing the SELECT statement.

Modify application code to cast to respective data type before execution of query.


$con = $this->getSpannerConnection();

 /**
 * Cloud Spanner follows strict type so cast the columns appropriately
*/
$select = $con->addCast($select, "`t_d`.`value`", 'string');
$select = $con->addCast($select, "`t_s`.`value`", 'string');
$select = $con->addCast($select, "IF(t_s.value_id IS NULL, 
    t_d.value, t_s.value)", 'string');

$values = $con->fetchAll($select);

Modify the application code to cast the column to its respective type 


/**
    * Cast the column with type
    * @param string $sql
    * @param string $col
    * @param string $type
    * @return string
    */
   public function addCast(string $sql,
       string $col, string $type)
   {
      $cast = "cast(".$col." as ".$type.")";
      return str_replace($col, $cast, $sql);
   }

Using interleaved tables to improve read performance

Cloud Spanner's table interleaving is a great choice for many parent-child relationships where the child table's primary key includes the parent table's primary key columns. Interleaving ensures that child rows are collocated with their parent rows, which can significantly improve query performance.

Refer to the statements here for a few samples of creating interleaved tables. To learn more about table interleaving, visit the documentation.

Conclusion

Database migrations are complicated. Hopefully, using HarbourBridge and the list of tips in this article can make that task easier. For additional tips, please take a look at this migration guide and read more about HarbourBridge.

The Original Article can be found on https://cloud.google.com/blog

databases cloud

What is Geek Coin

What is GeekCash, Geek Token

Best Visual Studio Code Themes of 2021

Bootstrap 5 Tutorial - Bootstrap 5 Crash Course for Beginners

Nest.JS Tutorial for Beginners

Hello Vue 3: A First Look at Vue 3 and the Composition API

Multi-cloud Spending: 8 Tips To Lower Cost

Mismanagement of multi-cloud expense costs an arm and leg to business and its management has become a major pain point. Here we break down some crucial tips to take some of the management challenges off your plate and help you optimize your cloud spend.

What are the benefits of cloud migration? Reasons you should migrate

To move or not to move? Benefits are multifold when you are migrating to the cloud. Get the correct information to make your decision, with our cloud engineering expertise.

What Is a Cloud Native Database?

### To follow the trend and cater to the storage needs of cloud native applications, cloud native databases offer all the quality needed in the cloud. You should understand what exactly cloud native database means, how it relates to Kubernetes...

Google Cloud: Caching Cloud Storage content with Cloud CDN

In this Lab, we will configure Cloud Content Delivery Network (Cloud CDN) for a Cloud Storage bucket and verify caching of an image. Cloud CDN uses Google’s globally distributed edge points of presence to cache HTTP(S) load-balanced content close to our users.

Backup database to Google Cloud Storage

In this tutorial, we'll learn backup Database to Google Cloud storage. A process to set up periodic database backup to Google Cloud Storage.