Featurization of Text data

Featurization of Text data

It first constructs a dictionary of the set of all the words in the TEXT. It consists of all unique words in the TEXT. It represents word as a sparse matrix.

1 — Bag of Words

It first constructs a dictionary of the set of all the words in the TEXT. It consists of all unique words in the TEXT. It represents word as a sparse matrix.

For each document(row), find unique words where each word is a different dimension. Each cell consists of the number of times the word occurs in the respective row.

Related image

d will be very large where most of the cells have zero value. This is the reason a sparse matrix will be formed.

If two vectors are very similar then they will be very closer.

Image for post

So length between two vectors is d=|(Term1-Term2)| norm equal to square root of d.

Code:

Image for post

Drawback:

BOW does not take semantic meaning into consideration. Ex. tasty and delicious have the same meaning but BOW considers as separate.

deep-learning data-science artificial-intelligence naturallanguageprocessing machine-learning deep learning

Bootstrap 5 Complete Course with Examples

Bootstrap 5 Tutorial - Bootstrap 5 Crash Course for Beginners

Nest.JS Tutorial for Beginners

Hello Vue 3: A First Look at Vue 3 and the Composition API

Building a simple Applications with Vue 3

Deno Crash Course: Explore Deno and Create a full REST API with Deno

How to Build a Real-time Chat App with Deno and WebSockets

Convert HTML to Markdown Online

HTML entity encoder decoder Online

Artificial Intelligence (AI) vs Machine Learning vs Deep Learning vs Data Science

Artificial Intelligence (AI) vs Machine Learning vs Deep Learning vs Data Science: Artificial intelligence is a field where set of techniques are used to make computers as smart as humans. Machine learning is a sub domain of artificial intelligence where set of statistical and neural network based algorithms are used for training a computer in doing a smart task. Deep learning is all about neural networks. Deep learning is considered to be a sub field of machine learning. Pytorch and Tensorflow are two popular frameworks that can be used in doing deep learning.

Most popular Data Science and Machine Learning courses — July 2020

Most popular Data Science and Machine Learning courses — August 2020. This list was last updated in August 2020 — and will be updated regularly so as to keep it relevant

Artificial Intelligence vs. Machine Learning vs. Deep Learning

Simple explanations of Artificial Intelligence, Machine Learning, and Deep Learning and how they’re all different

Artificial Intelligence, Machine Learning, Deep Learning 

Artificial Intelligence (AI) will and is currently taking over an important role in our lives — not necessarily through intelligent robots.

Data Augmentation in Deep Learning | Data Science | Machine Learning

Data Augmentation is a technique in Deep Learning which helps in adding value to our base dataset by adding the gathered information from various sources to improve the quality of data of an organisation.