Word Embedding: Word2Vec — Skip-gram and CBOW

3 min readJun 4, 2022

What is Word Embedding?

Word Embedding is a term used to describe numerical representation of text.

We learnt some basic text vectoization techniques in the previous article such as Bag of Words, TF-IDF.. etc. Now let us look at some more advanced techniques for text representation.

Creating a sparse representation using TF-IDF and other basic vectorization methods do not make much sense as they do not capture the semantics and relationships of the words.

Hence, we need to create a way to capture semantics. We want to represent words in such a manner that it captures its meaning in a way humans do. Not the exact meaning of the word but a contextual one.

Word2Vec:

The Word2Vec is a technique introduced to capture the similarity between words in a text. It follows the following steps-

Learn an embedding vector for each word.
Build a probability model.
Use dot product to measure similarity.

We have 2 flavors of Word2Vec -

Skip Gram Model

The Skip-gram model tries to predict the source context words (surrounding words) given a target word (the center word). Considering our simple sentence below, “the man loves his son”. The skip-gram model’s aim is to predict the context from the target word, the model tries to predict each context word from its target word. Hence the task becomes to predict the context [the, man, his, son] given target word ‘loves’ . Thus the model tries to predict the context_window words based on the target_word.

Assumptions:
A word can be used to generate the words that surround it.
Given the center word, the context words are generated independently.

For each center word (from t=1, .., T), predict context words inside a window.

Objective: Maximize the probability of context words for given center words

Continuous bag of words (CBOW)

The CBOW model architecture tries to predict the current target word (the center word) based on the source context words (surrounding words). Considering a simple sentence, “the man loves his son”, this can be pairs of (context_window, target_word) where if we consider a context window of size 2, we have examples like ([loves], the, man, his, son). Thus the model tries to predict the target_word based on the context_window words.

When to use which method?

Skip-gram:

When you have small amount of training data.
CBOW:

When training time needs to be less.