Exclusive Filter: A Term Ends with a Valid End-Unit (VEU) matches Pattern of no SpVar
The valid-end-units are derived from Lexicon. Some end-units from the invalid lead-end-unit candidate list are valid-end-units and used to checked in the spVar pattern, such as "after", "for", "worth", etc.. N-grams end with any of these pattern valid end-units and does not have spelling variant co-exist in n-gram set are most likely not valid multiwords. In 2014, there are 37 valid end-units found from program. 10 of them are removed and only 27 valid end-units are used for the pattern of no spVar. Terms - "I", "W", "all", "bar", "may", "mine", "minus", "need", "one", and "other" have valid MWE in Lexicon without spVar and thus they are removed as well. Please refer to design documents of End-Unit Types for details.
Description | FilterType | Notes |
---|---|---|
Get invalid terms | FT_TBD |
|
Check if an invalid-unit | FT_END_TERM_INV_PAT | Use invalid-unit-list from above step |
FilterType.FT_END_TERM_INV_PAT
Lexicon | Filter | Sample No | Pass No | Trap No | Exp No | Pass-Rate |
---|---|---|---|---|---|---|
2023 | FT_END_TERM_INV_PATTERN | 1001867 | 1001838 | 29 | 0 | 99.9971% |
2022 | FT_END_TERM_INV_PATTERN | 998845 | 998816 | 29 | 0 | 99.9971% |
2021 | FT_END_TERM_INV_PATTERN | 992545 | 992516 | 29 | 0 | 99.9971% |
2020 | FT_END_TERM_INV_PATTERN | 983420 | 983391 | 29 | 0 | 99.9971% |
2019 | FT_END_TERM_INV_PATTERN | 972721 | 972692 | 29 | 0 | 99.9970% |
2018 | FT_END_TERM_INV_PATTERN | 955564 | 955535 | 29 | 0 | 99.9970% |
2017 | FT_END_TERM_INV_PATTERN | 935276 | 935247 | 29 | 0 | 99.9969% |
2016 | FT_END_TERM_INV_PATTERN | 915583 | 915554 | 29 | 0 | 99.9968% |
2015 | FT_END_TERM_INV_PATTERN | 896213 | 896190 | 23 | 0 | 99.9974% |
2014 | FT_END_TERM_INV_PATTERN | 875090 | 875068 | 22 | 0 | 99.9975% |