The SPECIALIST Lexicon

Mixed-Case Spelling Variants

I. Introduction

From the false postives of the results on Lexicon.2015 test, we found there are other valid spVars are identified by SpVarNorm, but are not in the goldStd. They are categorized into mixed cased spVars. Again, these are error in the Lexicon resulting from the facts that current closed-match (by gSpell) are not able to provide such information to linguists when they add a new terms to lexicon.

II. Details

  • Program:
    shell> 08.MatcherSpVar 2016
    32E
  • Outfile:
    ${OUT_DIR}/LexTest/Lex.1.byNorm.out.spVar.FP.4.rmSpVarsByPat
  • Algorithm:
    RemoveSpVarsByPatternsFromAFile.java

    From the false-positive of spVarNorm:

    • exclude Genitive spVars
    • exclude Dash spVars
    • exclude Punctuation spVars (" '\"!&")
    • Dashexlucde terms with punctuation (".'")

    • Manually exam

III. Example.2015

    noun|E0505467|adenosine diphosphate ribosyl transferase|E0520817|adenosine diphosphate ribosyltransferase
    noun|E0335042|dihydrolipoyl acetyl transferase|E0335043|dihydrolipoyl acetyltransferase
    noun|E0333715|dilauroyl phosphatidic acid|E0503323|dilauroylphosphatidic acid
    noun|E0431494|green bottle fly|E0229339|greenbottle fly
    noun|E0502503|methyl seleninic acid|E0502320|methylseleninic acid
    verb|E0416117|mislabeled|E0416117|mislabelled
    adj|E0063266|unlabeled|E0063267|unlabelled
    verb|E0036645|labeled|E0036645|labelled
    verb|E0312403|relabeled|E0312403|relabelled
    verb|E0332756|backlabeled|E0332756|backlabelled
    ...