🎓 🔥 From SQL to NoSQL, a migration path

image

Homework

Don’t forget to complete your upgrade and get your verified skill badge! Finish and submit your homework!

  1. Complete the practice steps 1-4 from this repository as described below. Make screenshots of the last step (load data with DSBulk)
  2. Complete scenario Cassandra Data Modeling at make a screenshot of the “congratulations” page.
  3. Submit your homework here

That’s it, you are done! Expect an email next week!

Table of content

  1. Create your Astra Instance
  2. Create petclinic NoSQL data model
  3. Generate your Astra application token and service account
  4. Load data into Astra with DSBulk

1. Create your Astra instance

ASTRA service is available at url https://astra.datastax.com. ASTRA is the simplest way to run Cassandra with zero operations at all - just push the button and get your cluster. No credit card or any payment required, $25.00 USD credit every month, roughly 5M writes, 30M reads, 40GB storage monthly - sufficient to run small production workloads.

✅ Step 1a. Register (if needed) and Sign In to Astra : You can use your Github, Google accounts or register with an email.

Make sure to chose a password with minimum 8 characters, containing upper and lowercase letters, at least one number and special character

Registration Image

Login Image

✅ Step 1b. Create a “pay as you go” plan

Follow this guide and use the values provided below, to set up a pay as you go database with a FREE $25 monthly credit.

Parameter Value
Database name sql_to_nosql_db
Keyspace name spring_petclinic

2. Create petclinic NoSQL data model

Ok, now that you have a database created the next step is to create a tables to work with.

✅ Step 2a. Navigate to the CQL Console and login to the database

In the Summary screen for your database, select CQL Console from the top menu in the main window. This will take you to the CQL Console and automatically log you in.

✅ Step 2b. Describe keyspaces and USE killrvideo

Ok, now we’re ready to rock. Creating tables is quite easy, but before we create one we need to tell the database which keyspace we are working with.

First, let’s DESCRIBE all of the keyspaces that are in the database. This will give us a list of the available keyspaces.

📘 Command to execute

desc KEYSPACES;

“desc” is short for “describe”, either is valid

📗 Expected output

Screen Shot 2021-04-06 at 2 11 09 PM

Depending on your setup you might see a different set of keyspaces then in the image. The one we care about for now is spring_petclinic.

From here, execute the USE command with the spring_petclinic keyspace to tell the database our context is within spring_petclinic.

📘 Command to execute

use spring_petclinic;

📗 Expected output

Screen Shot 2021-04-06 at 2 12 24 PM

Notice how the prompt displays token@cqlsh:spring_petclinic> informing us we are using the spring_petclinic keyspace. Now we are ready to create our tables.

✅ 2c. Create tables

  • Execute the following Cassandra Query Language. Copy and paste the following statements into your CQL Console

📘 Command to execute

use spring_petclinic;

DROP INDEX IF EXISTS petclinic_idx_vetname;
DROP INDEX IF EXISTS petclinic_idx_ownername;
DROP TABLE IF EXISTS petclinic_vet;
DROP TABLE IF EXISTS petclinic_vet_by_specialty;
DROP TABLE IF EXISTS petclinic_reference_lists;
DROP TABLE IF EXISTS petclinic_owner;
DROP TABLE IF EXISTS petclinic_pet_by_owner;
DROP TABLE IF EXISTS petclinic_visit_by_pet;

/** A vet can have multiple specialties. */
CREATE TABLE IF NOT EXISTS petclinic_vet (
  id          uuid,
  first_name  text,
  last_name   text,
  specialties set<text>,
  PRIMARY KEY ((id))
);

/** We could search veterinarian by their names. */
CREATE INDEX IF NOT EXISTS petclinic_idx_vetname ON petclinic_vet(last_name);

/** We may want to list all radiologists. */
CREATE TABLE IF NOT EXISTS petclinic_vet_by_specialty (
 specialty   text,
 vet_id      uuid,
 first_name  text,
 last_name   text,
 PRIMARY KEY ((specialty), vet_id)
);

/** 
 * Here we want all values on a single node, avoiding full scan. 
 * We pick am unordered set to avoid duplication, list to be sorted at ui side. 
 */
CREATE TABLE IF NOT EXISTS petclinic_reference_lists (
  list_name text,
  values set<text>,
  PRIMARY KEY ((list_name))
);

/** Expecting a combobox list references all specialties. */
INSERT INTO petclinic_reference_lists(list_name, values) 
VALUES ('vet_specialty', {'radiology', 'dentistry', 'surgery'});

CREATE TABLE IF NOT EXISTS petclinic_owner (
  id         uuid,
  first_name text,
  last_name  text,
  address    text,
  city       text,
  telephone  text,
  PRIMARY KEY ((id))
);

/** We could search veterinarians by their names. */
CREATE INDEX IF NOT EXISTS petclinic_idx_ownername ON petclinic_owner(last_name);

CREATE TABLE IF NOT EXISTS petclinic_pet_by_owner (
  owner_id   uuid,
  pet_id     uuid,
  pet_type   text,
  name       text,
  birth_date date,
  PRIMARY KEY ((owner_id), pet_id)
);
CREATE TABLE IF NOT EXISTS petclinic_visit_by_pet (
   pet_id      uuid,
   visit_id    uuid,
   visit_date  date,
   description text,
   PRIMARY KEY ((pet_id), visit_id)
);

INSERT INTO petclinic_reference_lists(list_name, values) 
VALUES ('pet_type ', {'bird', 'cat', 'dog', 'lizard','hamster','snake'});
  • You should now have a set of petclinic tables
describe tables;

📗 Expected output

Screen Shot 2021-04-07 at 9 29 25 AM

3. Generate your Astra application token

In order for you to securely connect to your Cassandra database on Astra you need to generate an application token. The cool thing once you generate this once you can then use it for any of your applications or tools to talk to your database.

✅ 3a. Generate your application token

If you don’t already have one follow the instructions HERE to generate your new token. Don’t forget to download it once created because you will not be able to see it again without generating a new one.

Once you DOWNLOAD the token if you view the contents they should look something like this:

"Client Id","Client Secret","Token","Role"
"fdsfdslKFdLFdslDFFDjf","aaaaaaadsdadasdasdasdfsadfldsjfldjdsaldjasljdasljdsaljdasljdasljdlasjdal-FLflirFdfl.lfjdfdsljfjdl+fdlffkdsslfd","AstraCS:ppppdspfdsdslfjsdlfjdlj:540524888-04384039399999999999999999","Admin User"

You’ll need to use this in a moment to authenticate with DSBulk so keep it handy.

4. Transform and load data with DSBulk

In order to use DSBulk you need to download and install it. While you can do this locally if you would like following the instructions HERE we’ve already provided it for you using GitPod. Click the button below to launch your instance.

Open in Gitpod

✅ 4a. Get your secure connect bundle

We’re going to need a secure connect bundle to talk to our Astra database with an external app. The bundle contains all of the information about where our cluster is up on the cloud and how to securly connect. After you launched GitPod you may have noticed a prompt asking you for your Token.

Screen Shot 2021-04-07 at 9 33 16 PM

Now, you will see a prompt asking for your cluster ID.

Screen Shot 2021-04-07 at 9 43 15 PM

Go back to the Astra UI dashboard screen and:

  1. choose the sql_to_nosql_db database
  2. copy the Cluster ID using the copy widget

Screen Shot 2021-04-07 at 9 15 26 AM

Finally, paste the DB ID into prompt in GitPod and hit ENTER. That’s it, you should have your bundle.

📗 Expected output Screen Shot 2021-04-07 at 9 43 32 PM

✅ 4b. Load owner table SQL export into petclinic_owner NoSQL table

Ok, we’re going to use DSBulk in this section to:

  • connect to our Astra database using the CLIENT ID and CLIENT SECRET we created earlier in step 3 and the secure connect bundle astra-creds.zip
  • load data from the owner.csv file (exported from our relational DB owner table)
  • do this using a regular INSERT statement that maps values from our CSV file while transforming data with UUID()
  • use CSV file headers to identify what data each delimited column contains
  • and finally set our delimiter to use “;”

Once this command is constructed it should look something like this: Screen Shot 2021-04-07 at 8 05 22 AM

An example of how to construct the above DSBulk command can be found HERE.

We’ve made this a little easier by constructing the command for you. Just run the dsbulk.sh script. This will ask for the CLIENT ID and CLIENT SECRET you created earlier. When it asks, just paste in your value and hit ENTER to go to the next step.

📘 Command to execute

bash dsbulk.sh

📗 Expected output Screen Shot 2021-04-07 at 8 29 07 AM

Now, go back to CQL Console in your Astra UI and view the data from the petclinic_owner table.

📘 Command to execute

SELECT * FROM petclinic_owner;

📗 Expected output Screen Shot 2021-04-07 at 8 37 54 AM

Awesome! You now have data in your new NoSQL table.

✅ 4b. Let’s break this down a bit

So great, you just ran the DSBulk command and something happened, but lets explain this a bit more.

First thing, here is the source CSV we are using generated from our SQL relational database for the owner table. Screen Shot 2021-04-07 at 8 11 49 AM

If you remember from our discussion and the slide deck when moving into something like Cassandra we don’t tend to use INTegers for IDs. We use UUIDs instead. This is to ensure IDs are truly unique when using a distributed system. With that, we need to transform the INT based IDs from the SQL data to UUIDs. We do this by using the **UUID()** function in our INSERT statement.

Screen Shot 2021-04-07 at 9 37 31 AM

This will generate UUIDs for us as data is inserted into our new Cassandra table.

Also, notice the header line in our CSV.

Screen Shot 2021-04-07 at 9 44 17 AM

If you look back to the INSERT statement we used you may have noticed something like

VALUES (:first_name,:last_name,:address,:city,:telephone,UUID())

where you see items :first_name and :last_name. These are bindings. They are binding the values passed into the INSERT statement to the each column by the name of the column. You can use this to map values all sorts of ways and this just scratches the surface, but I think you get the idea.

THE END

Whoohoo! Congrats on making it to the end. While this workshop just touches on this process from a pretty high level hopefully it gives you an idea of the kinds of things you need to do and where to start. To learn more about what you can do with DSBulk take a look at the docs HERE. Also, don’t forget that you can use Astra to experiment and play around with your data model for FREE well within the limits of the $25 monthly credit.

#sql #nosql #database

From Relational (SQL) to Columnar NoSQL (CQL)
12.80 GEEK