Many posts out there catalog the “gotchas” of Python and/or its most popular packages. This blog is yet another entry in that series, but with one difference: I’ve actually, genuinely made all these mistakes myself (some of them with embarrassing frequency). However, just knowing the definition and type of each classic Python object should help you avoid most (if not all) of these mistakes in your work!

Without any further ado, let’s begin.

1. Truthy Or Falsy: NumPy.nan and Pandas.nan

You probably know that for checking if an object’s value is True or False, you can do something like the following:

lst = [1, 2, 3]
a = None

## rather than this ...
if len(lst) > 0 or a is not None: print('success')
## you can simply do this ...
if lst or not a: print('success')

That’s because empty lists (and all other empty sequences/collections), FalseNone, 0 (of any numeric type) all evaluate to False. This set of objects and values are therefore known as “falsy”.

Consider the following example: you have a dictionary of items and their costs, which you use to build a dataframe for some analysis.

import pandas as pd

d1 = {'item': ['foo', 'bar', 'baz'], 'cost': [100, None, 20]}
df = pd.DataFrame(d1)
## lots of analysis here ...
## if an item has a cost, print the item and its cost
for i, r in df.iterrows():
    if r['cost']:
        print(f"item = {r['item']} and cost = {r['cost']}")

You expect:

item = foo, cost = 100.0
item = baz, cost = 20.0

But you get:

item = foo, cost = 100.0
item = bar, cost = nan
item = baz, cost = 20.0

The reason is that Pandas considers None to be missing or unavailable, and thus represents it with nan. Since nan is not falsy, it flows through.

#data-science #programming #python #data analytic

Five Python Gotchas!
1.10 GEEK