Bubble plots are an improved version of the scatter plot. In a scatter plot, there are two dimensions x, and y. In a bubble plot, there are three dimensions x, y, and z. Where the third dimension z denotes weight. That way, bubble plots give more information visually than a two dimensional scatter plot.

For this tutorial, I will use the dataset that contains Canadian immigration information. It has the data from 1980 to 2013 and it includes the number of immigrants from 195 countries. import the necessary packages and the dataset:

```
import numpy as np
import pandas as pd
df = pd.read_excel('https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/DV0101EN/labs/Data_Files/Canada.xlsx',
sheet_name='Canada by Citizenship',
skiprows=range(20),
skipfooter=2)
```

The dataset is too big. So, I can not show a screenshot here. Let’s see the name of the columns.

```
df.columns
#Output:
Index([ 'Type', 'Coverage', 'OdName', 'AREA', 'AreaName', 'REG',
'RegName', 'DEV', 'DevName', 1980, 1981, 1982,
1983, 1984, 1985, 1986, 1987, 1988,
1989, 1990, 1991, 1992, 1993, 1994,
1995, 1996, 1997, 1998, 1999, 2000,
2001, 2002, 2003, 2004, 2005, 2006,
2007, 2008, 2009, 2010, 2011, 2012,
2013],
dtype='object')
```

We are not going to use a lot of the columns. I just dropped those columns and set the name of the countries (‘OdName’) as the index.

```
df = df.drop(columns = ['Type', 'Coverage', 'AREA', 'AreaName', 'REG', 'RegName', 'DEV', 'DevName',]).set_index('OdName')
df.head()
```

I chose the data of Ireland and Brazil for this exercise. There is no special reason. I chose them randomly.

```
Ireland = df.loc['Ireland']
Brazil = df.loc['Brazil']
```

There are a few different ways to normalize the data. We normalize the data to bring the data in a similar range. Ireland and Brazil immigration data have different ranges. I needed to bring them to the range from 0 to 1. I simply divided the Ireland data by the maximum value of the Ireland data series. I did the same to the data Series of Brazil.

```
i_normal = Ireland / Ireland.max()
b_normal = Brazil / Brazil.max()
```

We will plot the Ireland and Bazil data against the years. It will be useful to have the years on a list.

```
years = list(range(1980, 2014))
```

Just to see the difference, let’s plot the scatter plot first.

```
import matplotlib.pyplot as plt
plt.figure(figsize=(14, 8))
plt.scatter(years, Ireland, color='blue')
plt.scatter(years, Brazil, color='orange')
plt.xlabel("Years", size=14)
plt.ylabel("Number of immigrants", size=14)
plt.show()
```

Now, plot the bubble plot. We have to input the size that we defined before.

