Level Up with Google’s Public Datasets

I remember my first data science project, analyzing a 6.5MB file on ten thousand movies. It was magical to quickly find patterns from data from a few keystrokes and soon I was on the hunt to find other datasets to analyze. Going through Kaggle and eventually generating unique data through web crawling, I found that most downloadable datasets were small (<1GB) and that collecting and working with large datasets required a different type of skill: data engineering.

Most aspiring data scientists only know how to analyze clean, comma separated files that fit on their RAM (usually <8GB). In practice, data science is much more unpredictable when you deal with terabytes of data in different formats from different data streams. So how does an aspiring data scientist gain exposure to big data?

To take you to big data, we will explore:

  1. Diving into Big Data with Google Cloud Public Datasets
  2. What is BigQuery?
  3. How to Connect Your Jupyter Notebook to BigQuery

Diving into Big Data with Google Cloud Public Datasets

It turns out that Google has a repository of most publicly available data such as air qualityUS censusRedditflight. If there’s a public dataset, Google probably has it hosted, available for you to query. What’s more is that new datasets (i.e. Covid-19) are constantly being added and updated.

With the data hosted on BigQuery, you can easily query the data to see if it’s interesting. For example, I can query and answer the question: “How many political ad campaigns on Google, targeted women in Florida?”

Google's TPU's being primed for the Quantum Jump

The liquid-cooled Tensor Processing Units, built to slot into server racks, can deliver up to 100 petaflops of compute.

As the world is gearing towards more automation and AI, the need for quantum computing has also grown exponentially. Quantum computing lies at the intersection of quantum physics and high-end computer technology, and in more than one way, hold the key to our AI-driven future.

Quantum computing requires state-of-the-art tools to perform high-end computing. This is where TPUs come in handy. TPUs or Tensor Processing Units are custom-built ASICs (Application Specific Integrated Circuits) to execute machine learning tasks efficiently. TPUs are specific hardware developed by Google for neural network machine learning, specially customised to Google’s Machine Learning software, Tensorflow.

The liquid-cooled Tensor Processing units, built to slot into server racks, can deliver up to 100 petaflops of compute. It powers Google products like Google Search, Gmail, Google Photos and Google Cloud AI APIs.

Inside ABCD, A Dataset To Build In-Depth Task-Oriented Dialogue Systems

According to a recent study, call centre agents’ spend approximately 82 percent of their total time looking at step-by-step guides, customer data, and knowledge base articles.

Traditionally, dialogue state tracking (DST) has served as a way to determine what a caller wants at a given point in a conversation. Unfortunately, these aspects are not accounted for in popular DST benchmarks. DST is the core part of a spoken dialogue system. It estimates the beliefs of possible user’s goals at every dialogue turn.

To reduce the burden on call centre agents and improve the SOTA of task-oriented dialogue systems, AI-powered customer service company ASAPP recently launched an action-based conversations dataset (ABCD). The dataset is designed to help develop task-oriented dialogue systems for customer service applications. ABCD consists of a fully labelled dataset with over 10,000 human dialogues containing 55 distinct user intents requiring sequences of actions constrained by company policies to accomplish tasks.

The dataset is currently available on GitHub.

What Are Google Compute Engine ? - Explained

The Google computer engine exchanges a large number of scalable virtual machines to serve as clusters used for that purpose. GCE can be managed through a RESTful API, command line interface, or web console. The computing engine is serviced for a minimum of 10-minutes per use. There is no up or front fee or time commitment. GCE competes with Amazon’s Elastic Compute Cloud (EC2) and Microsoft Azure.

Embedding your <image> in google colab <markdown>

This article is a quick guide to help you embed images in google colab markdown without mounting your google drive!

Just a quick intro to google colab

Google colab is a cloud service that offers FREE python notebook environments to developers and learners, along with FREE GPU and TPU. Users can write and execute Python code in the browser itself without any pre-configuration. It offers two types of cells: text and code. The ‘code’ cells act like code editor, coding and execution in done this block. The ‘text’ cells are used to embed textual description/explanation along with code, it is formatted using a simple markup language called ‘markdown’.

Embedding Images in markdown

If you are a regular colab user, like me, using markdown to add additional details to your code will be your habit too! While working on colab, I tried to embed images along with text in markdown, but it took me almost an hour to figure out the way to do it. So here is an easy guide that will help you.


The first step is to get the image into your google drive. So upload all the images you want to embed in markdown in your google drive.

Step 2:

Google Drive gives you the option to share the image via a sharable link. Right-click your image and you will find an option to get a sharable link.

On selecting ‘Get shareable link’, Google will create and display sharable link for the particular image.

All about Google ‘Verse-by-Verse’ |

Google’s Verse by Verse allows the users to rekindle their old love for poetry. By offering innovative suggestions, the software aims to take users to a road less traveled.

