LexCheck

LexCrossCheck

  • Descriptions:
    Cross-reference check the content of lexical records from a text file. It validates (defined in ./CheckCont/ErrMsgUtilLexicon.java:

    Check FlagValueCondition/ActionDescriptionTypeAuto-Fix
    DUP EUI1if EUI is duplicatedCheck duplicated EUIserrorno
    DUP REC2
    • if cit|cat is duplicated
    • A record is represented by citation|category
    • Potential dupRec are sent to ${outFile}.dupRec
    Check potential duplicated recordswarningno
    nominalization, abbreviations, acronyms
    NO EUI3
    • No CC-EUI (found by cit|cat) & CR-EUI exists
    • remove CR-EUI
    CR-EUI is not correcterroryes
    NEW EUI13
    • No CC-EUI & no CR-EUI
    • These cit|cat are not in Lexicon and should be added into Lexicon
    cit|cat is new in Lexiconwarningno
    MISS EUI6
    • 1 CC-EUI & No CR-EUI
    • assign CC-EUI to CR-EUI
    missing CR-EUI is foundwarningyes
    WRONG EUI7
    • 1 CC-EUI & not matches CR-EU
    • requires manual review
    CR-EUI is different from CC-EUIerrorno
    MISS EUIs8
    • Multi CC-EUIs & no CR-EUI
    • requires manual review
    missing CR-EUI with multi candidate CC-EUIserrorno
    WRONG EUIs9
    • Multi CC-EUIs & none matches CR-EUI
    • requires manual review
    CR-EUI is not in CC-EUIserrorno
    nominalization
    SYM CIT10
    • N-EUI found (nom found by EUI) & cit not matches
    • requires manual review
    citation not match in symmetric nomerrorno
    SYM CAT11
    • N-EUI found & cat not matches
    • requires manual review
    category not match in symmetric nomerrorno
    SYM NONE12
    • No N-EUI
    • requires manual review
    Not symmetricerrorno

  • Usage:
    shell> LexCrossCheck <inFile> <autoFixFile> <prepostionFile> <particleFile> <dupRecExpFile> <notBaseFormFile> <-v: verbose>
    • inFile: lexical record in text format
    • autoFixFile: auto-fixed lexical record in text format
    • prepositionFile: the preposition file, default: use the prepositions.data included in the lexCheck${YEAR}api.jar or lexCheck${YEAR}dist.jar
      => prepositions are used in the class of Compl.CheckPreposition.java.
    • particleFile: the particle file, default: use the particles.data included in the lexCheck${YEAR}api.jar or lexCheck${YEAR}dist.jar
      => particles are used in the class of Compl.CheckParticle.java.
    • dupRecExpFile: the duplicate exception file. The duplicated reports are generated in ${autoFixFile}.dupRec
    • notBaseFormFile: not base form, from the tag results (output of AnalyzeNewEuiFile: *cr/abb.out).
    • -v: set verbose to true, default: fault

  • Outputs:
    • On screen message:
      • Confirmed message if records are valid.
      • Otherwise, error message.
    • autoFixFile: Auto-fixed Lexicon
    • ${autoFixFile}.dupRec: duplicated records that need to be addressed before further checking

  • Notes:
    • Must include:
      • lexCheck${YEAR}dist.jar (for LVG APIs) or
      • lexCheck${YEAR}api.jar and lvg${YEAR}api.jar
    • Benchmark run time for Lexicon: 10 ~ 15 sec.

  • Examples:
    • shell> LexCrossCheck lexicon.txt lexicon.fixed ./data/Files/prepositions.data ./data/Files/particles.data ./data/Files/dupRecExceptions.data ./data/Files/notBaseForm.data -v
    • shell> LexCrossCheck lexicon.txt lexicon.fixed ./data/Files/prepositions.data ./data/Files/particles.data ./data/Files/dupRecExceptions.data ./data/Files/notBaseForm.data