Annually Release - Data from Lexicon
Following files are derived from lexicon and are needed to be installed to Lvg database first. All these operations are done under
"$LVG_Components/PreDataBase/" directory.
The whole set of these data file are stored on "$Lvg_Components/PreDataBase/data/{YEAR}/data/" directory.
shell> 1.LoadLexiconFiles ${YEAR}
1
2
3
4
=> load Lexcicon files from the Lexicon to PreDatabase/${DATA_ORG}. These files are used to generate the DB tables files for Lexical Tools.
shell> 2.GenerateLexiconFiles ${YEAR}
=> The seven files belows can be generated by the following command:
- infl.data
- $LexBuild/Tools/Lexicon/GenerateInflVars generates $Lexicon/{YEAR}/tables/inflVars.data
- copy above file to infl.data
- The format of fields of infl.data is:
Inflected form | Category (in number) | Inflection (in number) | EUI | Base form | Citation Form
|
- acronym.data
- $LexBuild/Tools/GenerateTables/GenerateTables generates $LexBuild/Lexicon/{YEAR}/tables/LRABR and put it in ./data/{YEAR}/dataOrg/.
- Run ModifyAcronym to change the format of above file and output to ./data/{YEAR}/data/acronym.data
- The new format of fields of acronym.data is:
- properNoun.data
- $LexBuild/Tools/GenerateTables/GenerateTables generates $LexBuild/Lexicon/{YEAR}/tables/LRPRP and put it in ./data/{YEAR}/dataOrg/.
- grep "|noun|proper|" ${LRPRP} | flds 2 | sort -u > ${TAR_DIR}proper
- Copy proper to ./data/{YEAR}/data/properNoun.data
- The new format of fields of properNoun.data is:
- nominalization.data
- $LexBuild/Tools/GenerateTables/GenerateTables generates $LexBuild/Lexicon/{YEAR}/tables/LRNOM and put it in ./data/{YEAR}/dataOrg/.
- Run ModifyNominalization to change the format of above file and output to ./data/{YEAR}/data/nominalization.data
- The new format of fields of nominalization.data is:
EUI 1 | term 1 | Category 1 | EUI 2 | term 2 | Category 2 |
derivation.data
- Copy and update derivation.data from the LEXICON - DM.data
- The new format of fields of derivation.data is:
Base 1 | POS 1 | EUI 1 | Base 2 | POS 2 | EUI 2 | Negation | Type | Prefix |
synonyms.data
- Copy and update synonyms.data from the LEXICON - SM.data
- The new format of fields of synonyms.data is:
index key of Base 1 | Base 1 | POS 1 | Base 2 | POS 2 | CUI |
antonyms.data
- Copy and update antonyms.data from the LEXICON - AM.data
- The new format of fields of antonyms.data is:
index key of Base 1 | Base 1 | EUI 1 | Base 2 | EUI 2 | POS | Type | Negation | Domain | Source |
After above 7 files are properly generated, steps described below are then followed:
- Copy above files to "$LVG_DIR/data/tables/" (./bin/MoveLexiconFiles)
- Run Analyze* to check max. sizes of all fields
- java AnalyzeInflection
- java AnalyzeAcronym
- java AnalyzeProperNoun
- java AnalyzeNominalization
- java AnalyzeDerivation
- java AnalyzeSynonym
- java AnalyzeAntonym
- Load these data into Idb and MySql database
cd $LVG_DIR/loadDb/bin
- LoadLexiconToMyIdb
- LoadLexiconToMySql