I recently had a lot of headaches caused by NaNs. Every programmer knows what they are, and why they happen, but in my case, I did not know all of their characteristics or not well enough to prevent my struggle. In the hope of finding solutions and avoiding a bad headache, I looked further into the behaviour of NaNs values in Python. After playing with a few statements in Jupyter Notebook, my results were quite surprising and extremely confusing. Here is what I had using np.nan from Numpy.

`np.nan in [np.nan]`

is `True`

So far so good, okay but …

`np.nan == np.nan`

is `False`

Huh? And …

`np.nan is np.nan`

is `True`

So what the hell is going on with NaNs in Python?

NaN stands for **Not A Number and is a common missing data representation.** It is a special floating-point value and cannot be converted to any other type than float. It was introduced by the IEEE Standard for Binary Floating-Point for Arithmetic (IEEE 754) before Python even existed and is used in all systems following this standard. NaN can be seen like some sort of data virus that infects all operations it touches.

None and NaN sound similar, look similar but are actually quite different. None is a Python internal type which can be considered as the equivalent of NULL. The `[None](https://www.w3schools.com/python/ref_keyword_none.asp)`

keyword is used to define a null value, or no value at all. None is not the same as 0, False, or an empty string. It is a datatype of its own (NoneType) and only None can be … None. While missing values are NaN in numerical arrays, they are None in object arrays. It is best to check for None by using `foo is None`

instead of `foo == None which brings`

us back to our previous issue with the peculiar results I found in my NaN operations.

At first, reading that `np.nan == np.nan`

is `False`

can trigger a reaction of confusion and frustration. It looks weird, sounds really weird but if you give it a little bit of thought, the logic starts to appear and even starts to make some sense.

Even though we do not know what every NaN is, not every NaN is the same.

Let’s imagine that instead of nan values, we are looking at a group of people that we do not know. They are completely unknown people to us. Unknown people can be seen as all the same to us, meaning that we describe them all as unknown. However, in reality, it does not mean that one unknown person is equal to another unknown person.

To leave this strange metaphor of mine and go back to Python, **NaN cannot be equal to itself because NaN is the result of a failure**, but that failure can happen in multiple ways. The result of one failure cannot be equal to the result of any other failure and unknown values cannot be equal to each other.

Now, to understand why `np.nan in [np.nan]`

is `True`

, we have to look at the difference between *equality* and *identity*.

Equality refers to the concept that most Python programmers know as “==”. This is used to ask Python whether the content of the variable is the same as the content of another variable.

```
num = 1
num2 = 1
num == num2
```

The last line will result in`True`

. **The content of both variables is the same**. As I said previously, the content of NaN is never equal to the content of another NaN.

Identity is when you are asking Python if a variable **is the same** as another variable, meaning you are asking Python whether the two variables share **the same identity**. Python assigns an **id** to each variable that is created, and ids are compared when Python looks at the identity of a variable in an operation. However, `np.**nan**`

**is a single object that always has the same id, no matter which variable you assign it to.**

```
import numpy as np
one = np.nan
two = np.nan
one is two
```

`np.nan is np.nan`

is `True`

and `one is two`

is also `True`

.

If you check the id of `one`

and `two`

using `id(one)`

and `id(two)`

, the same id will be displayed.

`np.nan in [np.nan]`

is `True`

because the list container in Python checks **identity** **before** checking **equality**. However, there are different “flavors”of nans depending on how they are created. `float(‘nan’)`

creates different objects with different ids so `float('nan') is float('nan')`

actually gives **False!!** We will mention these differences again later.

The full nan concept can be quite difficult to grasp and very annoying to deal with at first. Thankfully, **pandas** and **numpy** are fantastic when it comes to dealing with nan values and bring several functions that will easily, select, replace or delete the nan values in your variables.

As I said, whenever you want to know if a value is a nan, you cannot check whether it is equal to nan. However, there are many other options to do so and the one I propose are not the only ones available out there.

```
import numpy as np
import pandas as pd
var = float('nan')
var is np.nan #results in True
#or
np.isnan(var) #results in True
#or
pd.isna(var) #results in True
#or
pd.isnull(var)#results in True
```

`pd.isnull`

& `pd.isna()`

behave identically. Pandas provide the .isnull() function as it is an adaptation of R dataframes in Python. In R, null and na are two different types with different behaviours.

Other than numpy and as of **Python** 3.5, you can also use `math.**nan**`

. The reason why I wrote both nan and NaN in this article (apart from my lack of consistency) is the fact that the value is not case sensitive. Both `float(‘nan’)`

or `float(‘NAN’)`

will produce the same result.

```
import math
var = float('nan')
math.isnan(var) #results in True
```

**A little warning:**

```
import math
import numpy as np
math.nan is math.nan #results in True
math.nan is np.nan #results in False
math.nan is float('nan') #results in False
```

The statements give False because`math.nan`

, `np.nan`

and `float('nan')`

all have different ids. They do not have the same identity.

```
import pandas as pd
df = pd.DataFrame(some_data)
df.dropna()
#will drop all rows of your dataset with nan values.
#use the subset parameter to drop rows with nan values in specific columns
df.fillna()
#will fill nan values with the value of your choice
df.isnull()
#same as pd.isnull() for dataframes
df.isna()
#same as pd.isna() for dataframes
```

Unfortunately, I do not find the pandas documentation extremely helpful when it comes to their missing data documentation. However, I really appreciate this excerpt from the *Python Data Science Handbook*which gives a great overview on how to deal with missing data in Pandas.

TypeError: ‘float’ object is not iterable

While NoneType errors are quite clear, errors caused by nan values can be a little confusing. Nan values can often cause errors (more specifically **TypeErrors**) that will involve their type ‘**float**’. The error message can be surprising, especially when you believe that your data has absolutely no float. Your dataframe might not seem to include any floats, but actually, it really does. It probably has NaN values you did not know about and you simply need to get rid of your nan values in order to get rid of this error!

#python #programming

2 Likes19.65 GEEK