This article identifies the most common mistakes prospect data scientists make and discuss how to avoid them. Without further ado, let’s jump straight into it.

Inefficient use of pandas

Pandas is a library frequently used by data scientists to handle structured data. The most common mistake is iterating through the rows in the dataframe using a “for loop”. Pandas have built-in functions e.g. apply or applymap that enables you to apply a function to a selection or all columns for all the rows of the dataframe or when conditions are met. Optimally, however, you could work with Numpy arrays that provide the most efficiency.

Inefficient:

my_list=[] 
for i in range(0, len(df)):        
   l = myfunction(df.iloc[i]['A'], df.iloc[i]['B'])
   my_list.append(l)
df['newColumn'] = my_list
Time:640ms

#python #pandas #sklearn #data-science #programming

Top 3 programming mistakes every data scientist makes
1.15 GEEK