All vectors are UNequal, but some are more UNequal than the others…

All vectors are UNequal, but some are more UNequal than the others…

Searching for meaning in Trump’s tweets. Part II.Continuing on from where we left off in Part I and diving into vectorised text. Discussing the problems of using text vector representations and setting up the scene for the final part, where we are going to try and fix them. In part I, we have set up the problem and gone through some basic exploratory analysis of the dataset.

Continuing on from where we left off in Part I and diving into vectorised text. Discussing the problems of using text vector representations and setting up the scene for the final part, where we are going to try and fix them.

In part I, we have set up the problem and gone through some basic exploratory analysis of the dataset. To recap, we are using four years worth of tweet history by Trump as a testing ground for NLP analysis. The advantage is, it is a challenging dataset, quite messy, with lots of abbreviated and concatenated words, hashtags. Each tweet is a relatively short sentence and sometimes part of the sentence (because of splits between several tweets or because it is a one word commentary on attached link or image). The brevity of the format provides a particular challenge for topic recognition as we don’t really have all that much text to go on.

For links to dataset and for initial processing steps (and some extra analysis) please see part I of the series here:

Searching for Meaning in Trump’s Tweets. Part I

It’s harder than you might think. An in-depth NLP analysis, using LDA, TSNE, Spacy, Gensim and XX-Berts for good…

medium.com

Here we are going to run with the preprocessed DataFrame where we already created a collection of trigrams.

Here’s a snapshot to give you a better idea of what we are up against:

Image for post

Word2Vec

In this post, we are going to take a look at a fairly standard word2vec representation using the large corpus from Spacy (please see part I for imports/setup). Bert representations are going to be very similar, at least in terms of steps we need to take to get to final answer (we are going to use pre calculated weights), so I am going to leave them for part 3.

machine-learning python nlp data-science trump

Bootstrap 5 Complete Course with Examples

Bootstrap 5 Tutorial - Bootstrap 5 Crash Course for Beginners

Nest.JS Tutorial for Beginners

Hello Vue 3: A First Look at Vue 3 and the Composition API

Building a simple Applications with Vue 3

Deno Crash Course: Explore Deno and Create a full REST API with Deno

How to Build a Real-time Chat App with Deno and WebSockets

Convert HTML to Markdown Online

HTML entity encoder decoder Online

Data Science With Python Training | Python Data Science Course | Intellipaat

🔵 Intellipaat Data Science with Python course: https://intellipaat.com/python-for-data-science-training/In this Data Science With Python Training video, you...

How I'd Learn Data Science If I Were To Start All Over Again

A couple of days ago I started thinking if I had to start learning machine learning and data science all over again where would I start?

Applied Data Analysis in Python Machine Learning and Data Science | Scikit-Learn

Applied Data Analysis in Python Machine learning and Data science, we will investigate the use of scikit-learn for machine learning to discover things about whatever data may come across your desk.

Data Science Projects | Data Science | Machine Learning | Python

Practice your skills in Data Science with Python, by learning and then trying all these hands-on, interactive projects, that I have posted for you.

Data Science Projects | Data Science | Machine Learning | Python

Practice your skills in Data Science with Python, by learning and then trying all these hands-on, interactive projects, that I have posted for you.