The SPECIALIST Lexicon

End-Unit Types

The results from above steps ( invalid-laead-end-units candidates - 444) are categorized into following end-unit types:

  • Absolute Invalid End-Unit (407)
    Units from invalid lead-end-unit candidates that are not end units in Lexicon. They are stored in the file - invalidEndUnits.data.abs. They are used in exclusive filter - absolute invalid end-unit to filter out any n-gram that ends with these absolute invalid end-units.

  • Valid End-Unit Pattern - Without Spelling Variants (27)
    Units from invalid lead-end-unit candidates that end units in Leixcon. That are stored in the file - validEndUnits.data.pat. They are used in exclusive filter - valid end-unit pattern, without spelling variants to filter out any n-gram that ends with these end-units without spelling variant patterns of:
    • non-spaced:
      built in|builtin
      touch down|touchdown
    • hyphened:
      built in|built-in
      touch down|touch-down

    • capitalized:
      Capitalized is not considered as spVar pattern, such as built in|built In is not a valid spVar. However, all captialized units are counted for it's own spVar in the program, such as BUILT IN|BUILT-IN.

    In other words, if a n-gram ends with these valid-end-unit and have no spelling variants (with space or hyphen) co-exist in n-gram set, it is invalid.

  • End-Unit - not used (10)
    LexRecords end with these units do not have spVar. They are removed from valid-end-unit-pattern. They need further observation.
    • I
    • W
    • all
    • bar
    • may
    • mine
    • minus
    • need
    • one
    • other