This article identifies the most common mistakes prospect data scientists make and discuss how to avoid them. Without further ado, let’s jump straight into it.
Pandas is a library frequently used by data scientists to handle structured data. The most common mistake is iterating through the rows in the dataframe using a “for loop”. Pandas have built-in functions e.g. apply
or applymap
that enables you to apply a function to a selection or all columns for all the rows of the dataframe or when conditions are met. Optimally, however, you could work with Numpy arrays that provide the most efficiency.
Inefficient:
my_list=[]
for i in range(0, len(df)):
l = myfunction(df.iloc[i]['A'], df.iloc[i]['B'])
my_list.append(l)
df['newColumn'] = my_list
Time:640ms
#python #pandas #sklearn #data-science #programming