Frequency Analysis on 5 WC ranges: 100, 1K, 10K, 100K, 1M
I. Introduction
Frequenct strategy is important for LMW acquistion. It is applied to LMW candidates obtained from fitlers and matchers for better precision. This page describes an frequency analysis on 5 word count range (100, 1K, 10K, 100K, 1M).
II. Details
${MULTIWORDS}/bin/08.MatcherSpVar
${MULTIWORDS}/data/2015/outData/08.MatcherSpVar/
${MULTIWORDS}/data/2015/outData/08.MatcherSpVar/Candidates/tag.2017.good
Tag | Description |
---|---|
AUTO_YES | Automatically tagged by computer if term is in Lexicon |
AUTO_NO | Automatically tagged by computer if term is in Lexicon |
Y | Manually tagged by linguists if term is LMW, then add to Lexicon |
N | Manually tagged by linguists if term is not LMW, then add to invalid LMW List |
III. Results
Frequency | Precision (New Terms) | Precision (Total Terms) |
---|---|---|
100 | 19.81% (= 104/525) | 21.60% (= 116/537) |
1K | 36.77% (= 196/533) | 42.42% (= 249/587) |
10K | 47.73% (= 263/551) | 67.56% (= 604/894) |
100K | 35.72% (= 384/1075) | 68.38% (= 1516/2217) |
1M | 36.77% (= 556/1512) | 71.16% (= 2396/3367) |
The total precision is increased as the frequency increase. Thus, we should acquire LMW from the highest frequency n-grams.
Details data are available at:
${MULTIWORDS}/data/2015/outData/08.MatcherSpVar/Candidates/tag.2017.good/*.rpt