The SPECIALIST Lexicon

Exclusive Filter: A Term contains pattern of disallowed characters

  • Description:
    If a term contains pattern of disallowed characters, it is an invalid MWE. Disallowed characters includes:

    Ascii disallowed chars_@\~=|$`^{<#*}!;?>"Derived from the Lexicon, see details at disallowed punctuation
    Unicode disallowed chars=⁺⁻×÷⁼⅀∀∑−≅≠≤≥≦≧±¼½¾⅓⅔⅕⅖⅗⅘⅙⅚⅛⅜⅝⅞°Derived from CSpell Consumer Health Corpus

  • Examples:
    • (ps> 0.05)
    • group (n=6) received
    • US$

    • cooked, ½ cup
    • 101°f (38.3°c)
    • 179 ± 16

  • Input Term: original term
  • Filter Algorithm:
    • Logics:

      DescriptionFilterTypeNotes
      Check if word contains disallowed charactersFT_CHAR_DISALLOW
      • filtered invalid terms - contain disallowed characters :

    • source code: FilterPuncDisallow.java
    • FilterType: FilterType.FT_CHAR_DISALLOW

  • Accuracy Test on Lexicon:
    • InFile:
      • ${OUT_DATA}/03.LeadEndTerm/lexWords.data
    • Result:

      LexiconFilterSample NoPass NoTrap NoExp NoPass-Rate
      2023FT_CHAR_DISALLOW10018671001851 16 099.9984%
      2022FT_CHAR_DISALLOW998845998829 16 099.9984%
      2021FT_CHAR_DISALLOW992545992529 16 099.9984%
      2020FT_CHAR_DISALLOW983420983406 14 099.9986%
      2019FT_CHAR_DISALLOW972721972707 14 099.9986%
      2018FT_CHAR_DISALLOW955564955550 14 099.9985%
      2017FT_CHAR_DISALLOW935276935263 13 099.9986%
      2016FT_CHAR_DISALLOW915583915570 13 099.9986%
      2015FT_CHAR_DISALLOW896213896194 19 099.9979%
      2014FT_CHAR_DISALLOW875090875071 19 099.9978%