Just curious on the behavior of 'where' and why you would use it over 'loc'.

Just curious on the behavior of 'where' and why you would use it over 'loc'.

If I create a dataframe:

df = pd.DataFrame({'ID':[1,2,3,4,5,6,7,8,9,10], 'Run Distance':[234,35,77,787,243,5435,775,123,355,123], 'Goals':[12,23,56,7,8,0,4,2,1,34], 'Gender':['m','m','m','f','f','m','f','m','f','m']})

And then apply the 'where' function:

df2 = df.where(df['Goals']>10)

I get the following which filters out the results where Goals > 10, but leaves everything else as NaN:

Gender Goals ID Run Distance 0 m 12.0 1.0 234.0 1 m 23.0 2.0 35.0 2 m 56.0 3.0 77.0 3 NaN NaN NaN NaN 4 NaN NaN NaN NaN 5 NaN NaN NaN NaN 6 NaN NaN NaN NaN 7 NaN NaN NaN NaN 8 NaN NaN NaN NaN 9 m 34.0 10.0 123.0

If however I use the 'loc' function:

df2 = df.loc[df['Goals']>10]

It returns the dataframe subsetted without the NaN values:

Gender Goals ID Run Distance 0 m 12 1 234 1 m 23 2 35 2 m 56 3 77 9 m 34 10 123

So essentially I am curious why you would use 'where' over 'loc/iloc' and why it returns NaN values?

Python Pandas Tutorial - Data Analysis with Python Pandas will help you get started with Python Pandas Library for various applications including Data analysis. You'll learn: Introduction to Pandas; DataFrames and Series; How To View Data? Selecting Data; Handling Missing Data; Pandas Operations; Merge, Group, Reshape Data; Time Series And Categoricals; Plotting Using Pandas

Python Pandas Tutorial will help you get started with Python Pandas Library for various applications including Data analysis. Introduction to Pandas. DataFrames and Series. How To View Data? Selecting Data. Handling Missing Data. Pandas Operations. Merge, Group, Reshape Data. Time Series And Categoricals. Plotting Using Pandas

After analyzing clients and market requirements, TopDevelopers has come up with the list of the best Python service providers. These top-rated Python developers are widely appreciated for their professionalism in handling diverse projects. When...