Reading data is the first step in any data science project. As a machine learning practitioner or a data scientist, you would have surely come across JSON (JavaScript Object Notation) data. JSON is a widely used format for storing and exchanging data. For example, NoSQL database like MongoDB store the data in JSON format, and REST API’s responses are mostly available in JSON.

Although this format works well for storing and exchanging data, it needs to be converted into a tabular form for further analysis. You are likely to deal with 2 types of JSON structure, a JSON object or a list of JSON objects. In internal Python lingo, you are most likely to deal with a dict or a list of dicts.

A dictionary and a list of dictionaries (Image by author)

In this article, you’ll learn how to use Pandas’s built-in function json_normalize() to flatten those 2 types of JSON into Pandas DataFrames. This article is structured as follows:

  1. Flattening a simple JSON
  2. Flattening a JSON with multiple levels
  3. Flattening a JSON with a nested list
  4. Ignoring KeyError if keys are not always present
  5. Custom separator using sep
  6. Adding prefix for meta and record data
  7. Working with a local file
  8. Working with a URL

Please check out Notebook for the source code.

