The SPECIALIST Lexicon

ASCII LEXICON: Reports and Review

I. Log Files

Three log files are generated for every line of LEXICON contains non-ASCII chracters (convert to ASCII). These logs files are the raw source files used to generate report files on the next steps. They are:

Log FileDesciptionTags
LEXICON.asciiBaseLogLog for the ASCII conversion on citation and spVars
  • |base|change|from-spVar|:
    citation is not ASCII, replaced from spVar
  • |base|delete|not-Lex|:
    citation is not ASCII, deleted due to not known to LEXICON (please notes that teh entire Lexical record is deleted for this case).

  • |spVar|delete|ascii-dup-base|:
    spVar is ASCII, deleted due to dupliated citation
  • |spVar|delete|dup-base|:
    spVar is non-ASCII, deleted due to duplicated citation
  • |spVar|delete|dup-spVar|:
    spVar is non-ASCII, deleted due to duplicated spVar
  • |spVar|delete|not-Lex|:
    spVar is non-ASCII, deleted due to not known to LEXICON
LEXICON.asciiLineLogLog for other line by line ASCII conversion
  • |change|ascii-base|:
    line contains non-ASCII, replace citation to known ASCII citation
    This only for acronamy, abbreviation, and nominalization with EUI field
  • |delete|not-Lex|:
    line contains non-ASCII, deleted due to not known to LEXICON or not used in the applications
LEXICON.asciiLogLog for the final clean upLog for duplicated lines

II. Reports

Reports are generated from the log files as described in follows:

ReportsDesriptionsAction201020112012
baseChange.rptnon-ASCII citations converted to ASCIIfgrep "|base|change|from-spVar|" LEXICON.asciiBaseLog2703680
baseDeleteNotLex.rptnon-ASCII citation are deleted (not known to LEXICON)fgrep "|base|delete|not-Lex|" LEXICON.asciiBaseLog29428
spVarDeleteAsciiDupBase.rptASCII spVars are deleted (duplicated from citation)fgrep "|base|delete|ascii-dup-base|" LEXICON.asciiBaseLog2703680
spVarDeleteDupBase.rptnon-ASCII spVars are deleted (duplicated from citation)fgrep "|base|delete|dup-base|" LEXICON.asciiBaseLog124815222102
spVarDeleteDupSpVar.rptnon-ASCII spVars are deleted (duplicated from spVars)fgrep "|base|delete|dup-spVar|" LEXICON.asciiBaseLog243733843916
spVarDeleteNotLex.rptnon-ASCII spVars are deleted (not known to LEXICON)fgrep "|base|delete|not-Lex|" LEXICON.asciiBaseLog259363430
baseSpVarNotLex.rptnon-ASCII citation and spVars are deleted (not known to LEXICON)
  • flds 6 baseDelete.rpt > fo1
  • flds 6 spVarDeleteNotLex.rpt > fo2
  • cat fo1 fo2 > fo3
  • sort -u fo3 > baseSpVarNotLex.rpt
284401430
asciiLineChange.rptnon-ASCII line are changed (known to LEXICON)fgrep "|change|ascii-base|" LEXICON.asciiLineLog242942
asciiLineDelete.rptnon-ASCII line are deleted (not known to LEXICON or not used)fgrep "|delete|not-Lex|" LEXICON.asciiLineLog789397
abbreviationChange.rptnon-ASCII abbreviations are changed (known to LEXICON)fgrep "abbreviation_of=" asciiLineChange.rpt359
abbreviationDeleteNotLex.rptnon-ASCII abbreviations are deleted (not known to LEXICON)fgrep "abbreviation_of=" asciiLineDelete.rpt113
acronymChange.rptnon-ASCII acronyms are changed (known to LEXICON)fgrep "acronym_of=" asciiLineChange.rpt202332
acronymDeleteNotLex.rptnon-ASCII acronyms are deleted (not known to LEXICON)fgrep "acronym_of=" asciiLineDelete.rpt61113
nominalizationChange.rptnon-ASCII nominalizations are changed (known to LEXICON)fgrep "nominalization=" asciiLineChange.rpt111
nominalizationDeleteNotLex.rptnon-ASCII nominalizations are deleted (not known to LEXICON)fgrep "nominalization=" asciiLineChange.rpt220
complDelete.rptnon-ASCII compl are deleted (not used)fgrep "compl=" asciiLineDelete.rpt244
irregDelete.rptnon-ASCII irreg are deleted (duplicated or not known to LEXICON)fgrep "variants=irreg|" asciiLineDelete.rpt667476
trademarkDelete.rptnon-ASCII trademark are deleted (not used)fgrep "trademark=" asciiLineDelete.rpt111

III. Review

  • Manual review as well as program automatic review are performed after above reports are generated. The automatic review is a program go through the *.rpt and filter out all exceptions (knowns from previous year) and send to *.out file. They are described as follows:

    FieldReportActions
     summary.rpt
    • manually review
    • Look for "==> Please review ..."
    • Look for "--- Others:" (if not 0)
    citationbaseDeleteNotLex.out
    • Send the remaining to LexBuilders
    • update exception list
    spVarspVarDeleteNotLex.out
    • Send the remaining to LexBuilders
    • update exception list
    irregirregDelete.out
    • Send the remaining to LexBuilders
    • update exception list
    abbreviationabbreviationDeleteNotLex.out
    • Send the remaining to LexBuilders
    • update exception list
    acronymacronymDeleteNotLex.out
    • Send the remaining to LexBuilders
    • update exception list
    nominalizationnominalizationDeleteNotLex.out
    • Send the remaining to LexBuilders
    • update exception list
    complcomplDelete.out
    • Send the remaining to LexBuilders
    • update exception list
    trademarktrademarkDelete.out
    • Send the remaining to LexBuilders
    • update exception list