⌘K

Mikayla Bashirian

Steps involved in BoW:

1. Construction of a d-dimensional dictionary

Here, we create an array of all the unique words in the document corpus.
Let there be ‘d’ unique words.
Every unique word is a dimension.

Note 1:_ A text, which could be a word or a sentence, is known as a document in NLP._

Note 2:_ A collection of such documents is known as a document corpus._

1.1 Example:

Let there be two documents in the document corpus as given below:

This car drives good and is expensive.
This car is not expensive and drives good.

We create a dictionary(or an array) of all the unique words in the document corpus as:

[This, car, drives, good, and, is, expensive, not]

2. Creating vector for each document

For every document, we create a d-dimensional vector.
Every dimension of a vector corresponds to a unique word.
The value of every dimension is equivalent to the number of occurrences of the unique word, in the given document, corresponding to that dimension.

Note 3:_ Generally the BoW creates sparse vectors. In a sparse vector, most of the dimensions have 0 value._

2.1 Example:

Let vectors v1 and v2 correspond to document 1 and document 2 respectively. Then these vectors are represented as:

v1 = [1 1 1 1 1 1 1 0]

v2 = [1 1 1 1 1 1 1 1]

#naturallanguageprocessing #machine-learning #data-science

Bag Of Words(BoW) -Steps involved in BoW

medium.com

Bag Of Words(BoW) -Steps involved in BoW

Natural Language Processing Text Featurization Technique. That means BoW doesn’t consider the semantic meaning of words. Bag of words contains a lot of stopwords(which are trivial).

1.25 GEEK