Because of a lapse in government funding, the information on this website may not be up to date, transactions submitted via the website may not be processed, and the agency may not be able to respond to inquiries until appropriations are enacted. The NIH Clinical Center (the research hospital of NIH) is open. For more details about its operating status, please visit cc.nih.gov. Updates regarding government operating status and resumption of normal operations can be found at OPM.gov
Dual Embedding in the CBOW Model
1. Introduction
The word2vec CBOW model was used for context-dependent ranking due to its simplicity and outstanding performance. The CBOW is a shallow machine learning neural network model with a single hidden layer (see the figure below). It is used to predict a target word at the output layer from a given context in the input layer. Two matrices, the input matrix (IM) and output matrix (OM), are used to calculate the hidden layer ([H] 1xn = [C] 1xw x [IM] wxn) and target words ([T] 1xw = [H] 1xn x [OM] nxw), respectively, where: W is the total number of words in the corpus and N is the dimension of hidden layer. Finally, the softmax function is used to convert output layer to probabilities (Pw) for updating OM and IM through backpropagation during the training process. The IM is known as the word vector [22-23] and used solely in almost all word2vec applications while the OM is always disregarded. We applied both IM and OM matrices to compute context scores of the predicted target word with given contexts ([T] 1xw = [C] 1xw x [IM] wxn x [OM] nxw) in CSpell. Namely, treat the hidden layer and target words as the 1st and 2nd embedding, respectively. The softmax function was not used because backpropagation is not needed (after training) in the application. The consumer health corpus established for word frequency was used to train the CBOW model to generate IM and OM. We modified word2vec code to generate both the IM and OM using window size of 5 and embedding size of 200. Context scores might be positive, zero, or negative. A zero context score means the target word does not have a word vector, which was not chosen over a negative score.
2. Algorithm of CBOW Model
3. Algorithm of Dual Embedding and Prediction Score
References: