The SPECIALIST Lexicon

Exclusive Filter: A Term is a Single Word

  • Description:
    If a term has no space, it is a single word (not a multiword). Such as:
    • See
    • whatever

  • Filter Algorithm:
    • Logics:

      DescriptionFilterTypeNotes
      Check if contain spaceFT_SINGLE_WORD
      • filtered single word

    • source code: FiltersingleWord.java
    • FilterType: FilterType.FT_SINGLE_WORD

  • Accuracy Test on Lexicon:
    • InFile:
      • ${OUT_DATA}/03.LeadEndTerm/lexWords.data
    • Result:

      LexiconFilterSample NoPass NoTrap NoExp NoPass-Rate
      2023FT_SINGLE_WORD1001867519260 482607 051.8292%
      2022FT_SINGLE_WORD998845518868 479977 051.9468%
      2021FT_SINGLE_WORD992545513960 478585 051.7820%
      2020FT_SINGLE_WORD983420505621 477799 051.4146%
      2019FT_SINGLE_WORD972721495103 477618 050.8988%
      2018FT_SINGLE_WORD955564479329 476235 050.1619%
      2017FT_SINGLE_WORD935276462668 472608 049.4686%
      2016FT_SINGLE_WORD915583446928 468655 048.8135%
      2015FT_SINGLE_WORD896213431432 464781 048.1394%
      2014FT_SINGLE_WORD875090417755 457335 047.7385%

      These filter should not be applied until the very last step because the inclusive filter - spelling variant pattern need them. For example, "clubfeet", "club-feet", and "club feet" are spelling variants. Multiwords "club feet" can't be found if both single word spelling variants are removed.