So I am a part of a WhatsApp group named as “Data Science Community”, recently I thought to explore the chat of this group and do some analysis on it. So, here in this article, I will take you through a WhatsApp group chat Analysis with Data Science.

If you don’t know how to extract the messages from any chat then just open any chat click on the 3 dots above, select more and then select explore chat, and share it with any means, most preferable your email.

The chat you will get at the end does not need any cleaning and preparation it can be used directly for the task. Now let’s start with this WhatsApp group chat analysis, I will simply import the required packages and get started with the task:

import regex
import pandas as pd
import numpy as np
import emoji
from collections import Counter
import matplotlib.pyplot as plt
from os import path
from PIL import Image
from wordcloud import WordCloud, STOPWORDS, ImageColorGenerator
% matplotlib inline

WhatsApp Group Chat Analysis

Although, the data is ready to use we still need to change the format of the date and time of messages which can be done easily. For this I will define a function that can detect whether each line starts with a date as it states that it is a unique message:

def startsWithDateAndTime(s):
    pattern = '^([0-9]+)(\/)([0-9]+)(\/)([0-9]+), ([0-9]+):([0-9]+)[ ]?(AM|PM|am|pm)? -' 
    result = re.match(pattern, s)
    if result:
        return True
    return False

#by aman kharwal #data analysis

WhatsApp Group Chat Analysis
1.90 GEEK