The SPECIALIST Lexicon

Test Lead-End-Unit in Exclusive Filter

I. Introduction

This section describes the testing processes and results of the exclusive filters of nonLead and nonEnd units. The Lexicon multiwords are used to tested. Ideally, the results should not filter out any multiwords (or min.) in Lexicon.

TBD:

  • This section of Java codes are moved to TBD because some files are under developement. We will need to come back to complete this sectiondue to the time constraint.

II. Processes

  • directory: ${MULTIWORDS_DIR}/bin
  • program: 3.NonLeadEndTerm
  • Run program: shell> ./3.NonLeadEndTerm ${YEAR}
  • Processes:

    StepDescriptionIONotes - Examples
    15Get ruleType on multiwords from Lexicon
    • GetLetRtForTermsInLexicon.java
    • Assign invalid Lead-End-Unit ruleTypes on Lexicon multiwords:
      • RT_INV_LEAD_TERM
      • RT_INV_END_TERM
      • RT_INV_END_ABB
      • RT_INV_LEAD_END_TERM
    Inputs:
    • ./outData/3.InvalidLeadEndTerm/lexMultiwords.data

    Outputs:

    • ./outData/3.InvalidLeadEndTerm/lexMultiwords.data.ruleType
    • ./outData/3.InvalidLeadEndTerm/lexMultiwords.data.ruleType.ilet (10)
    • 1 min.
    • Only 10 exceptions, all of them are RT_INV_END_ABB
      => Algorithm of endWord with abbreviation pattern can be improved
    16Analyze ruleType on multiwords from Lexicon
    • AnalyzeLetRtForTermsInLexicon.java
    • Analyze results from above step (10)
    • Get the precision of exclusive fitler on Lexicon
    Inputs:
    • ./3.InvalidLeadEndTerm/lexMultiwords.data.ruleType
    • ./3.InvalidLeadEndTerm/lexMultiwords.data.ruleType.exceptions

    outputs:

    • ./3.InvalidLeadEndTerm/lexMultiwords.data.ruleType.rpt
    • 5 sec.
    • precision: 99.9981%
    • 1 invalid ruleType: RT_INV_END_ABB
    17Get multiwords in Lexicon by lead/end units
    • GetLexiconMultiwordsByLeadEndTerm.java
    • Find all multiwords in Lexicon by specifying lead/end word
    Inputs:
    • ./outData/3.InvalidLeadEndTerm/lexMultiwords.data.ruleType
    Outputs:
    • ./outData/3.InvalidLeadEndTerm/LexiconMw/lexMultiwords.data.ruleType.${LEAD_END_WORD}
    • 5 sec.
    • Used for case study