The SPECIALIST Lexicon

SpVar Model Applications

Both new enhanced SpVarNorm and M2CES are used to find spVars from a corpus. This model is a very powerful model to reach 91.05% on F1-measure and 91.77on accuracy. Some important application are described as follows:

I. Close Match

  • After reviewing the gold standard, we found inflectional spVars, dashSpace spVars, mixed case spVars, are missing as spVars. The main reason is the current computer-aid program, close match, is not able to provide our linguists such words to associated with exsiting lexical records. Thus, the spVar model is planned to be implmented into LexBuild for close match mecahnism. The steps are:
    • Run spVarNorm on all inflVars
    • Run M2CES on all inflVars
    • Save to DB
      key (norm)MetaphoneCaverphoneinflvar
    • run spvarNorm on the new input
    • Retrieve all inflVars with the same key (norm form)

      Then

    • run Double Metaphone on the new input
    • run Mverphone 2.0 on the new input
    • Retrieve all inflVars with same Metaphone and Caverphone, with Edit distance of 2 (not sure if we should use sorted distance at this point, will see after implmenation)

II. Retrieve LexMultiword Candidates

  • 1st attemp is in early 2016 (AMIA.2016):
    Use spVarNorm, MES, ES, etc. for the spVar model to retrieve LMS candidates. It also applies other filters and matchers. Please see Precision, Recall, and F1 Analysis on AMIA.2016 paper
  • 2nd attemp is in mid 2016 (HealthInf.2017)
    Use enhanced spVarNorm and M2CES for the spVar model to retrieve LMS candidates.
    • Compare to the results from AMIA.2016
    • Due to the huge improvement on the performance, we should be able to reduce the frequency boundary (WC) on the MEDLINE distilled n-gram.
    • Combine with end-word pattern, CUI, and frequency for new LMW candidates