Text Summarization for Clustering Documents

This is a part 2 of the series analyzing healthcare chart notes using Natural Language Processing (NLP)

In the first part, we talked about cleaning the text and extracting sections of the chart notes which might be useful for further annotation by analysts. Hence, reducing their time in manually going through the entire chart note if they are only looking for “allergies” or “social history”.

NLP Tasks:

Pre-processing and Cleaning
Text Summarization — We are here
Topic Modeling using Latent Dirichlet allocation (LDA)
Clustering

DATA:

Source: https://mimic.physionet.org/about/mimic/

Doctors take notes on their computer and 80% of what they capture is not structured. That makes the processing of information even more difficult. Let’s not forget, interpretation of healthcare jargon is not an easy task either. It requires a lot of context for interpretation. Let’s see what we have:

#pos #nltk #spacy #python #nlp

NLP Tasks:

DATA:

medium.com

Text Summarization for Clustering Documents