Derivations Procedures - prefixD
Generate prefixD pairs in derivation table:
I. Directory: ${DERIVATION}/2.prefixD
II. Input Files (./data/${YEAR}/dataOrg/):
shell> ${PREFIX_D}/bin/GetPrefixD ${YEAR}
0
III. Final files for allD (release)
IV. Summary of GetPrefixD
Step | Description and Program | Input | Output | Notes | Step | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|
0 |
| See section II. | See section II. |
| 0 | ||||||
1 |
|
|
| 1 | |||||||
2 |
|
|
|
| 2 | ||||||
8 |
|
|
|
| 8 | ||||||
3 |
|
|
|
| 3 | ||||||
14 |
|
|
|
| 14 | ||||||
4 |
|
|
|
| 4 | ||||||
4a |
|
|
| Re-run this step until:
Then, rerun Steps: 3~4 until teh above three munmbers are 0 | 4a | ||||||
5 |
|
|
| Make sure unknonw dType (|U|) from prefixD is empty. | 5 | ||||||
6 |
|
|
|
| 6 | ||||||
15 |
|
|
|
| 15 | ||||||
16 |
|
|
|
| 16 | ||||||
7 |
|
|
|
|
V. Processes Details:
shell>cd ${DERIVATION}/prefixD/bin
shell>GetPrefixD ${YEAR}
1. Routine process (no new PD-Rules, no new Tag)
1: Get valid prefix base forms from LEXICON
=> generates ./data/bases.data
2: Retrieve raw prefixD pairs
or use
8: Retrieve possible raw prefixD pairs with options
DONE
for all prefix is done tagged
=> generates:
3: Add tags to prefixD meta file
=> generates ./data/prefixD.meta.data
must be tagged of [yes|no]
, all errors must be fixed
use tag of tbd
to bypass entry with tagging errors
3.1: Check conflicts by SpVars
(different dPair tags between 2 records).
=> generates ./data/prefixD.meta.data.conflict
Send to linguist to double check "[yes|no|both]"
=> Ideally, the tag of prefixD between two records should be the same
=> This file lists all inconsistent prefixD tags between two records (caused by SpVars).
=> If not empty, sent to linguist to tag [yes|no|both] the EUI line.
14: Auto-fix prefixD.tag.txt for conflicts by SpVars
=> Put the revised tagged file to: ./dataOrg/prefixD.meta.data.conflict.tag.data
=> copy ./dataOrg/prefixD.tag.txt.${YEAR}.fix to ./dataOrg/prefixD.tag.txt.${YEAR} and rerun this step.
4: Split prefixD meta file
=> generates
Make sure prefixD.tbt.data is empty. If not, sent to linguists to tag:
Tag negation: (O|N) if prefix is: a-, an-, de-, dys-, in-, under-
5: Verify dType on prefixD.yes.data
=> generates ./data/prefixD.yes.data.type
6: Add negation tag (N|O), it is uniquely sorted in the program (not by sort -u)
=> generates ./data/prefixD.yes.data.2014
Negation tagging error must be fixed
=> send to linguist to tag the negation (N|O)
6.1: Check conflict (inconsistent) tags between SpVars
generates ./data/prefixD.yes.data.${YEAR}.conflict
=> Ideally, the tag of prefixD between two records should be the same
Also, might cause inconsistent Negation tag on prefixD.
=> Ideally, the tag of negation between two records should be the same
=> If not empty, sent to linguist to tag (N|O|B) the EUI line.
=> The negation could have exceptions:
=> manually update this result to prefixD.yes.data.${YEAR}
=> The final prefix is in ${DERIVATION}/prefixD/data/${YEAR}/data/prefixD.yes.data.${YEAR}
15: Auto-fix prefixD.tag.txt for negation conflicts by SpVars
=> Put the revised tagged file to: ./dataOrg/prefixD.yes.data.${YEAR}.conflict.tag.data
Known cases in 2015 are:
1|E0013901|E0072172| # 556|antebrachium|noun|E0072172|brachium|noun|E0013901|O| # 1431|antibrachium|noun|E0072172|brachium|noun|E0013901|N| 2|E0013883|E0203565| # 557|antebrachial|adj|E0203565|brachial|adj|E0013883|O| # 1432|antibrachial|adj|E0203565|brachial|adj|E0013883|N| 3|E0024983|E0045258| # 11245|empanel|verb|E0024983|panel|noun|E0045258|O| # 15077|impanel|verb|E0024983|panel|noun|E0045258|N| 4|E0434097|E0580659| # 11243|embower|verb|E0580659|bower|noun|E0434097|O| # 15072|imbower|verb|E0580659|bower|noun|E0434097|N| 5|E0059482|E0523982| # 9310|disyllable|noun|E0523982|syllable|noun|E0059482|O| # 10500|dissyllable|noun|E0523982|syllable|noun|E0059482|N|
16: Auto-fix prefixD.tag.txt for negation conflicts by SpVars for class N and O
=> Check fix file exist: ./data/prefixD.negation.fix.data
=> copy ./data/prefixD.yes.${YEAR}.fixNegation to ./data/prefixD.yes.${YEAR}
7: Check afflix on prefixD.yes.data.${YEAR}
=> generates ./data/prefixD.pattern3.rpt (should be empty)
11: Run above 1-7 steps (default)
=> above steps from 1 ~ 7
2. Add new PD-Rules process
8: Retrieve possible raw prefixD pairs with options
${PREFIX}
to generate all prefixD pairs for a specified prefix (check the prefixD.rawNo.rpt.${PREFIX})
DONE
to retrieved all prefix are not TBD
3: Add tags to prefixD meta file
4: Split prefixD meta file
5: Verify dType on prefixD.yes.data
6: Add negation tag (N|O)
7: Compare original tag and result tag files
3. Add tag for new prefix dPairs (annual updates)
Update prefixD growth
Please refer to derivation design documents in Lexical Tools for details.