CSpell

Word2Vec Context

I. Introduction

  • Context:

    Context are important for context based ranking. Context are the words surrounding the target word. The target word is the candidate word (word for prediction from the context). A window includes context and target word. The window radius is the 1/2 size of the window. An example of context and target word is show in the following diagram.

    Input text... was diagnosed early onset dementia 3 years ago.
    Context Radius2
    Context Windowdiagnosed early onset dementia 3
    Target wordonset
    Contextdiagnosed early dementia 3
    ScoreInner product of (Avg. of [IM] for context) and ([OM] for the target word)

  • Skip context words without word vector: true (configurable, CS_W2V_SKIP_WORD)
    skip the context words if they do not have word vector, move to the next adjacent context word until the context window reach the radius.

II. Multiwords' Score by Context for the Merge-Split case in CBOW Model

If the target word is a term (words with space), there are two ways to retrieve the context in the word2vec CBOW model.

  • Multiword implementation (use avg. context score of each element word):

    Multiword with the same context, treat the target term as a single word

    => This method is implemented in the CSpell

    Context Radius2
    Context Windowdiagnosed early on set dementia 3
    Target wordon set
    Contextdiagnosed early dementia 3
    ScoreInner product of (Avg. of [IM] for context) and (Avg. [OM] for words in the target term)

  • Element word implementation:

    Single word with sliding context, get context for each single word in the target term

    Context Radius 2
    Context Windowdiagnosed early on set dementiaearly on set dementia 3
    Target wordonset
    Contextdiagnosed early set dementiaearly on dementia 3
    Score Inner product of (Avg. of [IM] for sliding context-1)
    and ([OM] for target word of [on])
    Inner product of (Avg. of [IM] for sliding context-2)
    and ([OM] for target word of [set])
    Final ScoreAvg. of above two scores