Because of a lapse in government funding, the information on this website may not be up to date, transactions submitted via the website may not be processed, and the agency may not be able to respond to inquiries until appropriations are enacted. The NIH Clinical Center (the research hospital of NIH) is open. For more details about its operating status, please visit cc.nih.gov. Updates regarding government operating status and resumption of normal operations can be found at OPM.gov.

LexCheck

LexCrossCheck

  • Descriptions:
    Cross-reference check the content of lexical records from a text file. It validates (defined in ./CheckCont/ErrMsgUtilLexicon.java:

    Check FlagValueCondition/ActionDescriptionTypeAuto-Fix
    DUP EUI1if EUI is duplicatedCheck duplicated EUIserrorno
    DUP REC2
    • if cit|cat is duplicated
    • A record is represented by citation|category
    • Potential dupRec are sent to ${outFile}.dupRec
    Check potential duplicated recordswarningno
    nominalization, abbreviations, acronyms
    NO EUI3
    • No CC-EUI (found by cit|cat) & CR-EUI exists
    • remove CR-EUI
    CR-EUI is not correcterroryes
    NEW EUI13
    • No CC-EUI & no CR-EUI
    • These cit|cat are not in Lexicon and should be added into Lexicon
    cit|cat is new in Lexiconwarningno
    MISS EUI6
    • 1 CC-EUI & No CR-EUI
    • assign CC-EUI to CR-EUI
    missing CR-EUI is foundwarningyes
    WRONG EUI7
    • 1 CC-EUI & not matches CR-EU
    • requires manual review
    CR-EUI is different from CC-EUIerrorno
    MISS EUIs8
    • Multi CC-EUIs & no CR-EUI
    • requires manual review
    missing CR-EUI with multi candidate CC-EUIserrorno
    WRONG EUIs9
    • Multi CC-EUIs & none matches CR-EUI
    • requires manual review
    CR-EUI is not in CC-EUIserrorno
    nominalization
    SYM CIT10
    • N-EUI found (nom found by EUI) & cit not matches
    • requires manual review
    citation not match in symmetric nomerrorno
    SYM CAT11
    • N-EUI found & cat not matches
    • requires manual review
    category not match in symmetric nomerrorno
    SYM NONE12
    • No N-EUI
    • requires manual review
    Not symmetricerrorno

  • Usage:
    shell> LexCrossCheck <inFile> <autoFixFile> <prepostionFile> <particleFile> <dupRecExpFile> <notBaseFormFile> <-v: verbose>
    • inFile: lexical record in text format
    • autoFixFile: auto-fixed lexical record in text format
    • prepositionFile: the preposition file, default: use the prepositions.data included in the lexCheck${YEAR}api.jar or lexCheck${YEAR}dist.jar
      => prepositions are used in the class of Compl.CheckPreposition.java.
    • particleFile: the particle file, default: use the particles.data included in the lexCheck${YEAR}api.jar or lexCheck${YEAR}dist.jar
      => particles are used in the class of Compl.CheckParticle.java.
    • dupRecExpFile: the duplicate exception file. The duplicated reports are generated in ${autoFixFile}.dupRec
    • notBaseFormFile: not base form, from the tag results (output of AnalyzeNewEuiFile: *cr/abb.out).
    • -v: set verbose to true, default: fault

  • Outputs:
    • On screen message:
      • Confirmed message if records are valid.
      • Otherwise, error message.
    • autoFixFile: Auto-fixed Lexicon
    • ${autoFixFile}.dupRec: duplicated records that need to be addressed before further checking

  • Notes:
    • Must include:
      • lexCheck${YEAR}dist.jar (for LVG APIs) or
      • lexCheck${YEAR}api.jar and lvg${YEAR}api.jar
    • Benchmark run time for Lexicon: 10 ~ 15 sec.

  • Examples:
    • shell> LexCrossCheck lexicon.txt lexicon.fixed ./data/Files/prepositions.data ./data/Files/particles.data ./data/Files/dupRecExceptions.data ./data/Files/notBaseForm.data -v
    • shell> LexCrossCheck lexicon.txt lexicon.fixed ./data/Files/prepositions.data ./data/Files/particles.data ./data/Files/dupRecExceptions.data ./data/Files/notBaseForm.data