Word2Vec Context
I. Introduction
Context are important for context based ranking. Context are the words surrounding the target word. The target word is the candidate word (word for prediction from the context). A window includes context and target word. The window radius is the 1/2 size of the window. An example of context and target word is show in the following diagram.
Input text | ... was diagnosed early onset dementia 3 years ago. |
Context Radius | 2 |
Context Window | diagnosed early onset dementia 3 |
Target word | onset |
Context | diagnosed early dementia 3 |
Score | Inner product of (Avg. of [IM] for context) and ([OM] for the target word) |
CS_W2V_SKIP_WORD
)
II. Multiwords' Score by Context for the Merge-Split case in CBOW Model
If the target word is a term (words with space), there are two ways to retrieve the context in the word2vec CBOW model.
Multiword with the same context, treat the target term as a single word
=> This method is implemented in the CSpell
Context Radius | 2 |
Context Window | diagnosed early on set dementia 3 |
Target word | on set |
Context | diagnosed early dementia 3 |
Score | Inner product of (Avg. of [IM] for context) and (Avg. [OM] for words in the target term) |
Single word with sliding context, get context for each single word in the target term
Context Radius | 2 | |
Context Window | diagnosed early on set dementia | early on set dementia 3 |
Target word | on | set |
Context | diagnosed early set dementia | early on dementia 3 |
Score |
Inner product of (Avg. of [IM] for sliding context-1) and ([OM] for target word of [on]) | Inner product of (Avg. of [IM] for sliding context-2) and ([OM] for target word of [set]) |
Final Score | Avg. of above two scores |