Lexical Tools

Annually Release - Data from Lexicon

Following files are derived from lexicon and are needed to be installed to Lvg database first. All these operations are done under "$LVG_Components/PreDataBase/" directory.
The whole set of these data file are stored on "$Lvg_Components/PreDataBase/data/{YEAR}/data/" directory.

shell> 1.LoadLexiconFiles ${YEAR}
1
2
3
4

=> load Lexcicon files from the Lexicon to PreDatabase/${DATA_ORG}. These files are used to generate the DB tables files for Lexical Tools.

shell>  2.GenerateLexiconFiles ${YEAR}

=> The seven files belows can be generated by the following command:

infl.data
- $LexBuild/Tools/Lexicon/GenerateInflVars generates $Lexicon/{YEAR}/tables/inflVars.data
- copy above file to infl.data
- The format of fields of infl.data is:
  
  Inflected form Category (in number) Inflection (in number) EUI Base form Citation Form
acronym.data
- $LexBuild/Tools/GenerateTables/GenerateTables generates $LexBuild/Lexicon/{YEAR}/tables/LRABR and put it in ./data/{YEAR}/dataOrg/.
- Run ModifyAcronym to change the format of above file and output to ./data/{YEAR}/data/acronym.data
- The new format of fields of acronym.data is:
  expNpLc exp type acrNpLc acr
properNoun.data
- $LexBuild/Tools/GenerateTables/GenerateTables generates $LexBuild/Lexicon/{YEAR}/tables/LRPRP and put it in ./data/{YEAR}/dataOrg/.
- grep "|noun|proper|" ${LRPRP} | flds 2 | sort -u > ${TAR_DIR}proper
- Copy proper to ./data/{YEAR}/data/properNoun.data
- The new format of fields of properNoun.data is:
  proper noun
nominalization.data
- $LexBuild/Tools/GenerateTables/GenerateTables generates $LexBuild/Lexicon/{YEAR}/tables/LRNOM and put it in ./data/{YEAR}/dataOrg/.
- Run ModifyNominalization to change the format of above file and output to ./data/{YEAR}/data/nominalization.data
- The new format of fields of nominalization.data is:
  EUI 1 term 1 Category 1 EUI 2 term 2 Category 2

derivation.data

Copy and update derivation.data from the LEXICON - DM.data
The new format of fields of derivation.data is:
Base 1 POS 1 EUI 1 Base 2 POS 2 EUI 2 Negation Type Prefix

synonyms.data

Copy and update synonyms.data from the LEXICON - SM.data
The new format of fields of synonyms.data is:
index key of Base 1 Base 1 POS 1 Base 2 POS 2 CUI

antonyms.data

Copy and update antonyms.data from the LEXICON - AM.data
The new format of fields of antonyms.data is:
index key of Base 1 Base 1 EUI 1 Base 2 EUI 2 POS Type Negation Domain Source

After above 7 files are properly generated, steps described below are then followed:

Copy above files to "$LVG_DIR/data/tables/" (./bin/MoveLexiconFiles)
Run Analyze* to check max. sizes of all fields
- java AnalyzeInflection
- java AnalyzeAcronym
- java AnalyzeProperNoun
- java AnalyzeNominalization
- java AnalyzeDerivation
- java AnalyzeSynonym
- java AnalyzeAntonym
Load these data into Idb and MySql database
cd $LVG_DIR/loadDb/bin
- LoadLexiconToMyIdb
- LoadLexiconToMySql