When we’re doing data analysis with Python, we might sometimes want to add a column to a pandas DataFrame based on the values in other columns of the DataFrame.

Although this sounds straightforward, it can get a bit complicated if we try to do it using an if-else conditional. Thankfully, there’s a simple, great way to do this using numpy!

To learn how to use it, let’s look at a specific data analysis question. We’ve got a dataset of more than 4,000 Dataquest tweets. Do tweets with attached images get more likes and retweets? Let’s do some analysis to find out!

We’ll start by importing pandas and numpy, and loading up our dataset to see what it looks like. (If you’re not already familiar with using pandas and numpy for data analysis, check out our interactive numpy and pandas course).

import pandas as pd
import numpy as np

df = pd.read_csv('dataquest_tweets_csv.csv')
df.head()

adding a column to a dataframe in pandas step 1: baseline dataframe

We can see that our dataset contains a bit of information about each tweet, including:

  • date — the date the tweet was posted
  • time — the time of day the tweet was posted
  • tweet — the actual text of the tweet
  • mentions — any other twitter users mentioned in the tweet
  • photos — the url of any images included in the tweet
  • replies_count — the number of replies on the tweet
  • retweets_count — the number of retweets of the tweet
  • likes_count — the number of likes on the tweet

We can also see that the photos data is formatted a bit oddly.

Adding a Pandas Column with a True/False Condition Using np.where()

For our analysis, we just want to see whether tweets with images get more interactions, so we don’t actually need the image URLs. Let’s try to create a new column called hasimage that will contain Boolean values — True if the tweet included an image and False if it did not.

To accomplish this, we’ll use numpy’s built-in [where()](https://numpy.org/doc/stable/reference/generated/numpy.where.html) function. This function takes three arguments in sequence: the condition we’re testing for, the value to assign to our new column if that condition is true, and the value to assign if it is false. It looks like this:

np.where(condition, value if condition is true, value if condition is false)

In our data, we can see that tweets without images always have the value [] in the photos column. We can use information and np.where() to create our new column, hasimage, like so:

df['hasimage'] = np.where(df['photos']!= '[]', True, False)
df.head()

new column based on if-else has been added to our pandas dataframe

Above, we can see that our new column has been appended to our data set, and it has correctly marked tweets that included images as True and others as False.

#data science tutorials #add column #beginner #conditions #dataframe #if else #pandas #python #tutorial #tutorials #twitter

Add a Column in a Pandas DataFrame Based on an If-Else Condition
5.95 GEEK