Lexical Tools

Annually Release - Data from Lexicon

Following files are derived from lexicon and are needed to be installed to Lvg database first. All these operations are done under "$LVG_Components/PreDataBase/" directory.
The whole set of these data file are stored on "$Lvg_Components/PreDataBase/data/{YEAR}/data/" directory.

shell> 1.LoadLexiconFiles ${YEAR}
1
2
3
4

=> load Lexcicon files from the Lexicon to PreDatabase/${DATA_ORG}. These files are used to generate the DB tables files for Lexical Tools.

shell>  2.GenerateLexiconFiles ${YEAR}

=> The seven files belows can be generated by the following command:

  • infl.data
    • $LexBuild/Tools/Lexicon/GenerateInflVars generates $Lexicon/{YEAR}/tables/inflVars.data
    • copy above file to infl.data
    • The format of fields of infl.data is:
      Inflected formCategory (in number)Inflection (in number)EUIBase formCitation Form

  • acronym.data
    • $LexBuild/Tools/GenerateTables/GenerateTables generates $LexBuild/Lexicon/{YEAR}/tables/LRABR and put it in ./data/{YEAR}/dataOrg/.
    • Run ModifyAcronym to change the format of above file and output to ./data/{YEAR}/data/acronym.data
    • The new format of fields of acronym.data is:
      expNpLcexptypeacrNpLcacr

  • properNoun.data
    • $LexBuild/Tools/GenerateTables/GenerateTables generates $LexBuild/Lexicon/{YEAR}/tables/LRPRP and put it in ./data/{YEAR}/dataOrg/.
    • grep "|noun|proper|" ${LRPRP} | flds 2 | sort -u > ${TAR_DIR}proper
    • Copy proper to ./data/{YEAR}/data/properNoun.data
    • The new format of fields of properNoun.data is:
      proper noun

  • nominalization.data
    • $LexBuild/Tools/GenerateTables/GenerateTables generates $LexBuild/Lexicon/{YEAR}/tables/LRNOM and put it in ./data/{YEAR}/dataOrg/.
    • Run ModifyNominalization to change the format of above file and output to ./data/{YEAR}/data/nominalization.data
    • The new format of fields of nominalization.data is:
      EUI 1term 1Category 1EUI 2 term 2Category 2

  • derivation.data
    • Copy and update derivation.data from the LEXICON - DM.data
    • The new format of fields of derivation.data is:
      Base 1POS 1EUI 1Base 2POS 2EUI 2NegationTypePrefix

  • synonyms.data
    • Copy and update synonyms.data from the LEXICON - SM.data
    • The new format of fields of synonyms.data is:
      index key of Base 1Base 1POS 1Base 2POS 2CUI

  • antonyms.data
    • Copy and update antonyms.data from the LEXICON - AM.data
    • The new format of fields of antonyms.data is:
      index key of Base 1Base 1EUI 1Base 2EUI 2POSTypeNegationDomainSource

    After above 7 files are properly generated, steps described below are then followed:

    • Copy above files to "$LVG_DIR/data/tables/" (./bin/MoveLexiconFiles)
    • Run Analyze* to check max. sizes of all fields
      • java AnalyzeInflection
      • java AnalyzeAcronym
      • java AnalyzeProperNoun
      • java AnalyzeNominalization
      • java AnalyzeDerivation
      • java AnalyzeSynonym
      • java AnalyzeAntonym
    • Load these data into Idb and MySql database
      cd $LVG_DIR/loadDb/bin

      • LoadLexiconToMyIdb

      • LoadLexiconToMySql