The SPECIALIST Lexicon

Dash-Space Spelling Variants

I. Introduction

Dash-Space spVars are one of the common spVars. Dash-space spVars are a subset of spVars. Accordingly, they must meet the criteria of spVars (same meaning, POS, syntax, and pronunciation) as well as match the dash-space pattern:

  • DashSpace pattern:
    • [xxx-yyy] and [xxx yyy]
    • [xxx-yyy] and [xxxyyy]
    • [xxx yyy] and [xxxyyy]

II. Algorithm

  • Dash-Space spvars are identified by SpVar model. Some of them are not included in Lexicon (by mistakes). A program is develped to retrieve terms with such patterns and not in Lexicon (false-positive from the SpVarNorm) to enhance the SPVarNorm algorithm.
  • Dash Pattern
    • Match the pattern of [xxx-yyy] to [xxx yyy] or [xxxyyy]
    • Exlcude if the term after the last '-' is a preposition in Lexicon
    • Must have same POS
    • Exlucde duplicates from the same EUISs (duplicated by inflections)
  • Space Pattern
    • Match the pattern of [xxx yyy] to [xxxyyy]
    • Exlcude if the last word is a preposition in Lexicon
    • Must have same POS
    • Exlucde duplicates from the same EUISs (duplicated by inflections)

III. Studies on Lexicon.2015

  • Dash Pattern:
    • 233 pairs that match dash spVar pattern from SpVarNorm (false positive) on Lexicon.2015
    • They are sent to linguist to tag [Y|N] for valid and invalid spVars inthe following format:
      POSEUI-1Term-1EUI-2Term-2Tag
    • Linguist combines EUI-1 and EUI-2 if the tag is [Y]
    • Examples:
      	noun|E0356150|anti-treponemal|E0009764|antitreponemal|y
      	noun|E0316451|gastro-cote|E0309756|gastrocote|n
      	noun|E0342133|joint-ill|E0214676|joint ill|y
      	noun|E0228131|mule-foot|E0228130|mule foot|n
      	noun|E0588600|re-flex|E0052428|reflex|n
      	verb|E0053396|re-form|E0052452|reform|n
      	verb|E0484710|re-present|E0052856|represent|n
      	noun|E0065691|writing-paper|E0339084|writing paper|y
      	noun|E0438345|yo-yo|E0686155|yoyo|n
      	...
      	

    • Space Pattern:

    • 58 pairs that match space spVar pattern from SpVarNorm (false positive) on Lexicon.2015
    • They are sent to linguist to tag [Y|N] for valid and invalid spVars inthe following format:
      POSEUI-1Term-1EUI-2Term-2Tag
    • Linguist combines EUI-1 and EUI-2 if the tag is [Y] (40)
    • Examples:
      	noun|E0220762|air bed|E0007871|airbed|Y
      	noun|E0525866|art glass|E0525865|artglass|N
      	noun|E0347021|bush dog|E0228170|bushdog|Y
      	noun|E0565815|crab tree|E0565814|crabtree|N
      	noun|E0509111|green stone|E0509110|greenstone|N
      	noun|E0227977|winter green|E0070850|wintergreen|N
      	...