LexCheck

LexCheck: File Generations

This page describes files that are needed in the LexCheck APIs as well as in Lexicon LMW candidate list generation and annual release validation. These files are stored at ${LC_PROC}/data/Files

  • prepositions.data
    • Description: prepositions are lexRecord with POS=prep.
    • Usage: used to check the syntax (grammar) and content of LEXICON (that use preposition in the code)
    • Source: prepositions (POS) retrieved from LEXICON annual release
    • Program: ${LC_PROC}/bin/GetFilesFromLexicon 1,2 11,12
    • APIs Used:
      • LexCheck
      • LexCrossCheck
      • ToXmlFromTextFile
      • ValidateContentFromTextFile
      • ValidateSyntaxFromTextFile

      • ToJavaObjApi

  • particles.data
    • Description: Adverbial particles are lexRecords with POS of adverb and have code of modification_type=particle
    • Usage: used to check the syntax (grammar) and content of LEXICON (for verb particle construction)
    • Source: Adverbial particles retrieved from LEXICON annual release
    • Program: ${LC_PROC}/bin/GetFilesFromLexicon 3 13
    • Program/APIs Used:
      • LexCheck
      • LexCrossCheck
      • ToXmlFromTextFile
      • ValidateContentFromTextFile
      • ValidateSyntaxFromTextFile

      • ToJavaObjApi

  • irregExceptions.data
    • Description: If a lexical record have irreg variants, all base forms (citation and spVars) should have irreg. This file includes those records with exceptions. This file include a EUI list for all records that not all base forms have irreg variants (when code irreg exists)
    • Usage: Used to validate the irreg content of Lexicon
    • Source: This list is updated during the Lexicon annual release validation and generation. Please refer to Lexicon validation, 3. Check Contents: irregException.data for details.
    • APIs Used:
      • LexCheck
      • ValidateContentFromTextFile
        => Check content with irreg

  • dupRecExceptions.data
    • Description: Two records with same base forms and categories are potential duplicated records. This file includes exceptions for different records (EUIs) that have same citation/base form and POS.
    • Usage: Used to filter out exception for duplicated record. For those duplications that are not known exceptions should be fixed in the LexBuild by removing these duplications.
    • Source: This list is updated during the Lexicon annual release validation and generation. Please refer to Lexicon validation, 4. Check Cross-Ref: 2.dup LexRecord for details.
    • APIs Used:
      • LexCrossCheck
        => Cross-reference check on potential duplicated records (that has same citation and POS)

  • notBaseForm.data
    • Description: This list includes invalid LMWs and inflections of LMWs. The expansion of abbreviations or acronyms in LEXICON are used as LMW candidates. During the LexBuild process, Linguists also tag these expansions for invalid LMWs and inflections of LMWs.
    • Usage: Used to filter out invalid LMWs and base forms.
    • Source: This file is updated during the Lexicon annual release validation and generation. Please refer to Lexicon validation, 4. Check Cross-Ref: 3. no EUI for details.
    • APIs Used:
      • LexCrossCheck
        =>Cross-reference check on cross-referenced citation for nom|abb|acr|class_type

  • notLmw.data
    • Description: This list includes invalid LMWs. The expansion of abbreviations or acronyms in LEXICON are used as LMW candidates. During the LexBuild process, Linguists also tag these expansions for invalid LMWs.
    • Usage: Used to filter/tag invalid LMW candidates that were previously evaluated.
    • Source: This file is updated during the Lexicon annual release validation and generation. Please refer to Lexicon validation, 4. Check Cross-Ref: 13. new EUI for details.
    • APIs Used: