Exclusive Filter Rules: Derive invalid lead units and invalid end units From Lexicon
I. Introduction
By definition, any known multiwords (in Lexicon) should not begin with an invalid lead unit. The only exception is a child word of the invalid word is the lead unit of the multiwords. For example, "above" is a lead word because there are 13 multiwords found in Lexicon that beginning with "above":
II. Source of invalid lead-end-units
Category | Examples | Lexicon.2014 | Lexicon.2015 |
---|---|---|---|
auxiliary | be, do, etc. | 3 (30) | 3 (30) |
complementizer | that | 1 (1) | 1 (1) |
conjunction | and, or, but, etc. | 71 (71) | 71 (71) |
determiner | a, the, some, etc. | 38 (38) | 38 (38) |
modal | may, must, can, etc. | 8 (27) | 8 (27) |
pronoun | it, he, they, etc. | 87 (87) | 87 (87) |
preposition | to, on, by, etc. | 233 (233) | 233 (233) |
III. Algorithm
LeadEndUnit Candidate | Matches No. | LeadUnit No. | LT Examples | LeadUnit | EndUnit No. | ET Examples | EndUnit | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
across | 0 | 0 | Invalid | 0 | Invalid | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
across from | 1 | 0 |
Invalid | 0 | | Invalid
| around | 0 | 0 | | Invalid | 1 | Valid
| as | 0 | 1 | Valid | 0 | | Invalid
| as far as | 1 | 0 | | Invalid | 0 | | Invalid
| as if | 1 | 2 | Valid | 0 | | Invalid
| down | 0 | 35 | Valid | 12 | Valid
| above | 0 | 13 | Valid | 0 | | Invalid
| on | 0 | 5 | Valid | 3 | Valid
| on board | 1 | 1 | Valid | 0 | | Invalid
| out | 0 | 12 | Valid | 43 | |
shell> ./3.InvalidLeadEndTerm ${YEAR}
Step | Description | IO | Notes - Examples |
---|---|---|---|
1 | Get all words and multiwords from Lexicon
| Inputs:
Outputs:
|
|
2 | Get invalid Lead-End-Unit candidates from Lexicon
| Inputs:
Outputs:
|
|
3 | Get child lead-units and end-units of invalid Lead-End-unit candidates
| Inputs:
Outputs:
|
|
4 | Get invalid LeadTerms and invalid EndTerms from LeadEndTerm candidates
|