Originally published by  Henry Heberle, PhD at  towardsdatascience.com

This is a post for who doesn’t have time to learn the complex syntax of the modified sql used in the Wikidata system and still would like to play with their data.

The original reason for me to have a comics database was to create this visualization, which you will see again at the end of this article:

Legend of Marvel & DC visualization: https://heberleh.github.io/comics-universe/

The pipeline

The idea here is to go through the following steps to get a flexible database:

1.    Query tables from Wikidata and export as json files

2.  Create an Alasql database and insert the data from the json files

3.  Use the created database performing SQL queries

1. Extracting data from Wikipedia

Wikipedia has a great database and ontology behind it. You can access the query system by clicking here: https://w.wiki/4iP

In the above link you will find something similar to this (and you will find more information about it there):

SELECT ?name ?gender ?genderLabel
  WHERE{
      ?name wdt:P31 wd:Q1114461;
          wdt:P1080 wd:Q931597.
      ?name wdt:P21 ?gender.
      SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE]". }
}

In this code I am selecting entities that are instances of comic character and are in the fictional universe Marvel Universe. It may seem strange at first, but each of those codes are part of the Wikipedia Ontology. It is how they deal with redundancy and can organize and make everything work! But how can you create your own queries?

First, will help you. Try typing wdt:<Ctrl+Space>. You notice that a helper tooltip pops up for you. Try typing instance of. You see that the code for ‘instance of’ is wdt:P31. Using the same approach, now for a value, you will type wdt:<Ctrl+Space> (no ‘t’ this time!) and comic character, enter, and voilà, you got the code wd:Q1114461.

<Ctrl+Space> opens the helper

Okay, Henry, but I have no idea about what we can use, I don’t have time to learn the Wikipedia language… and I understand you! So, to overcome this we will open this page here Juggernaut. This is the Wikipedia item code Q841372, the Juggernaut! Now look and check what is the information available for a comic character. Put your mouse pointer over the items in the Statements Section. For example, if you put your mouse over “instance of” you will get the P31 code! Since it is not a value, use wdt for this one.

By browsing that page you get a good idea about what are the properties you can use. Pause: If you want to create a database but don’t know yet about what it will be, try the Random article function to get insights. We can do the same thing for any Wikipedia page. Go to the official Wikipedia website and search for anything, let’s say… Aluminium. Now check the left-side bar and look for Wikidata item link.

Click, and voilà, it shows all the attributes for this item. We see we get the mass by using wdt:P2067, and many other attributes that may apply for other types of metal.

Each Wikidata entry has many properties. The tooltip tells you the property ID.

Now that we have the Wikidata, we will:

1.    Create a query for an attribute of interest

2.  Download the result as .json

3.  Always track the IDs (Wikidata item URL) and Labels

For instance, I created one query for gender. One query for occupation, and so on. The result is a folder with many .json files I will load in my database formed by many tables. You could perform a single query with all the information. In my case, I want it to be malleable, so I can perform JOIN, LEFT JOIN , and other SELECT commands to get different tables.

In fact, instead of getting only Marvel Universe, I discarded that filter and got a full database of instances of Fictional Characters (not only Comics Characters — for some reason this one would not query Captain America). So, in my json files, you find Marvel, DC, and others. The resulting jsons are found in my GitHub page. Next, I will show how I used the json files with an easy SQL DB in Javascript.

As I mentioned in the last paragraph, my query is actually about the class Fictional Characters, and its sub-classes. I also queried partners and spouses, so for this I used the command | to define two wdt:

SELECT DISTINCT ?char ?partner  ?partnerLabel
WHERE
{
	?char wdt:P31/wdt:P279* wd:Q95074. # instande of Fictional Character and any sub-class
    ?char wdt:P451|wdt:P26 ?partner  # partner|spouse
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE]". }
}

Note that in the above code I am not filtering for Marvel anymore. For more complex queries or handling lists, please refer to the Wikidata website. Another good option is the Help button at the top of the Wikidata Query Service where you can find examples and support.

2. Creating an SQL Database in Javascript

Now that we have the json files, let’s insert them into our SQL Database. For this, we will use a tool named Alasql. With Alasql, you can load data and create tables in different manners (read more here).

Here, since we have the jsons from Wikidata, we will apply a SELECT on each json (file) and insert into a new table in our database for easy access. For simplification, my code will contain only two tables; you can find the complete updated version here. It looks like this:

// Partial code - check my github to get the complete one
import alasql from 'alasql'
const character = require('../dataset/character.json');
const abilities = require('../dataset/abilities.json');
const fictionalUniverse = require('../dataset/fictionalUniverse.json');

function MarvelDB(){

// Our jsons and table names
let tables = [
    {name: 'abilities', data: abilities},
    {name: 'character', data: character}
]

// Create database
let db = new alasql.Database()

// Only characters registered in the Marvel Universe
db.exec('CREATE TABLE fictionalUniverse')
db.exec('INSERT INTO fictionalUniverse\
            SELECT char FROM ? \
            WHERE fictionalUniverseLabel LIKE "Marvel Universe"', [fictionalUniverse])

// For each Json in 'tables', create table
tables.forEach(table =&gt; {
    // Creates temporary table (Should not be necessary, 
    // but because of bugs caused by nesting INSERT INTO with SELECT WHERE IN,)
    // I kept it like this:
    db.exec(`CREATE TABLE aux_${table.name}`);
    db.exec(`INSERT INTO aux_${table.name} SELECT * FROM ?`, [table.data]);
    let filtered_data = db.exec(`SELECT * FROM aux_${table.name}\
                                    WHERE aux_${table.name}.char IN\
                                    (SELECT char FROM fictionalUniverse)`);

    // Then when can create our table
    db.exec(`CREATE TABLE ${table.name}`);
    db.exec(`INSERT INTO ${table.name} SELECT * FROM ?`, [filtered_data]);

    // and delete the temporary one
    db.exec(`DROP TABLE  aux_${table.name}`);
})

// Drop fictionalUniverse table -&gt; character table contains the same elements
db.exec('DROP TABLE fictionalUniverse')

return db

}

We are set. We have a database and can use it by calling let mydb = MarvelDB() and apply SQL commands like this: mydb.exec(‘SELECT DISTINCT abilityLabel from abilities’).

3. Final code and usage

I defined some tests to make sure everything was working. It exemplifies the usage of our database. If you are not used to testing, just consider the let result = this.db.exec(…) and console.table(result) in the code below and ignore the other parts.

// Unit test
describe(‘Select Queries’, function() {

beforeEach(() => this.db = marvelDB())

it(‘Name of the only agender with registered abilities’, () => {
let result = this.db.exec(
‘SELECT DISTINCT character.charLabel AS name
FROM character JOIN gender ON character.char = gender.char
JOIN abilities ON character.char = abilities.char
WHERE gender.genderLabel LIKE “agender”’);
//console.table(result);
expect(result[0].name).to.equal(“Phoenix Force”)
})
})

The above SELECT returns the column charLabel and renames to name. It also joins the tables character, gender and abilities, making sure the entries have the same character id (char). Finally, it filters by a specific gender named agender. For more examples on this database, please go to the GitHub page - don’t forget to star it :)

Originally published by  Henry Heberle, PhD at  towardsdatascience.com

==============================================================================

Thanks for reading :heart: If you liked this post, share it with all of your programming buddies! Follow me on Facebook | Twitter

Learn More

☞ The Ultimate MySQL Bootcamp: Go from SQL Beginner to Expert

☞ Learn MySQL - For Beginners

☞ SQL - MySQL for Data Analytics and Business Intelligence

☞ Learn Database Design with MySQL

☞ SQL for Beginners: Learn SQL using MySQL and Database Design

☞ The Complete MySQL Developer Course

☞ SQL Tutorial: Learn SQL with MySQL Database -Beginner2Expert

☞ MySQL Database: SQL- Learn MySQL & MySQL Database Management




#data-science #database #javascript

Create a Marvel Database with SQL and Javascript, the easy way
3 Likes29.10 GEEK