Step | Description and Program | Input | Output | Notes
|
---|
0 | - Prepare directories and files
| See section II. | See section II. | - 3.suffixD/data/${YEAR}/dataOrg
- LEXICON
- inflVars.data
- bases.data
- sdRules.data
- suffixD.tag.txt
|
|
1 | - Retrieve std-raw suffixD pairs
GetSuffixDRawFromBaseFile.java
|
| - suffixD.raw.data.fromBase.all
- sdRules.rawNo.rpt
| - Must complete prefixD Step-1 to get bases.data
- Need to rerun from this step if there are new Sd-Rules invloved
- Add new SD-Rules to ./dataOrg/sdRules.data.${YEAR}
- Get sd-pair (TBD) for each new sdRules
- Send TBD to linguist to tag [yes|no] from the following steps
- Save new tag result to ./dataOrg/newRuleTag/
- Add new tag result to ./dataOrg/suffixD.tag.txt.${YEAR}
|
2 | - Combine with nomD.S file (raw)
CheckWithNomDFile.java
| - ${NOM_TAR_DIR}:
- ${TAR_DIR}:
- suffixD.raw.data.fromBase
| - suffixD.raw.data.fromNomD
- suffixD.raw.data
| - Must link suffixD.raw.data.fromBase to suffixD.raw.data.fromBase.all to run this step
|
3 | - Add tags to suffixD meta file
GetSuffixDMetaFile.java- DPairTagList.java
| - ${NOM_TAR_DIR}:
- ${SRC_DIR}:
- suffixD.tag.txt (suffixD.tag.txt.${YEAR}.uSort)
- ${TAR_DIR}:
| - suffixD.meta.data
- suffixD.meta.data.conflict
|
- 1. Read and fix sdPair tags from tag file
- Remove duplicat and conflict tags from ./dataOrg/suffixD.tag.txt
- use uSort (shell> sort -u suffixD.tag.txt > suffixD.tag.txt.usort)
- => after uSort, duplicated tag no. should = conflict tag no (duplicate are removed by sort -u).
- go through the duplicated tag no and conflict tag no to fix them until both should be fixed to 0
- conflict tag (different tag): need to be fixed, send to linguist to re-tag.
- 2. Read and add sdPair tags from nomD file
- Ignore the long list of duplicated tags (between manual tags and normD tags) in the log.3
- Check and fix the Total conflict tag no (conflict between nomD and expert's tag)
- 3. Verify and fix conflict tags from spVars
|
9 | - Auto-fix suffixD.tag.txt
FixConflictDPairTags.java
| - ${SRC_DIR}:
- suffixD.tag.txt.${YEAR}
- suffixD.meta.data.conflict.tag.data
| ${SRC_DIR}- suffixD.tag.txt.${YEAR}.fixDPair
|
- Make sure use linguist tagging result to ./dataOrg/suffixD.meta.data.conflict.tag.data
- Manully exam ./dataOrg/suffixD.tag.txt.${YEAR}.fixDPair
- If suffixD.tag.txt.${YEAR}.fixDPair passes exam, move it to suffixD.tag.txt.${YEAR}, then re-run Step-3 again.
4 | - Split suffixD meta file (yes|no|tbd)
SplitSuffixDMetaFile.java
|
| - suffixD.yes.data
- suffixD.no.data
- suffixD.tbd.data
- suffixD.tbd.data.sort (sent to linguists)
- suffixD.yesNo.data
|
- Make sure suffixD.tbd.data(.sort) is empty. If not, sent to linguists to tag:
- Tag suffixD: (yes|no)
- valid suffixD: yes
- invalid suffixD: no
- Append (update) these new tagged sd-pairs (to ./dataOrg/suffixD.tag.txt) and rerun steps: 3~4
- add [tbd] if tags are missing to pass step-3.
| 4a | - Clean up tags on tagged file
CleanUpDPairTagList.java
|
| - ${SRC_DIR}:
Re-run this step until:
Go to the end of the log.4a file
- duplicate = 0 If not, replace suffixD.tag.data with suffixD.tag.data.cleanUp
- conflict = 0 If not, send conflict (from log.5a) to linguists to re-tag. Do NOT replace suffixD.tbd.data with suffixD.tbd.data.cleanUp until conflict = 0
- diff = 0 If not, replace suffixD.tbd.data with suffixD.tbd.data.cleanUp
- Then, rerun Steps: 3~4 until it is empty
| 5 | - Verify dType on suffixD.yes.data
DType.java
| - ${ALL_SRC_DIR}:
- ${TAR_DIR}:
| - suffixD.yes.data.type
- suffixD.yes.data.type.Z
- suffixD.yes.data.type.S
- suffixD.yes.data.type.P
- suffixD.yes.data.type.ZS
- suffixD.yes.data.type.SS
- suffixD.yes.data.type.PS
- suffixD.yes.data.type.U
| - Make sure unknonw dType (|U|) from suffixD is empty
- Must finish all new SD-rules (if any) before proceed this step
| 6 | - Automatically add negation tag [N|O], ~less$ is [N], others are [O]
then sort uniquely AddNegationTagToFile.java- DPairTagList.java
|
| - suffixD.yes.data.${YEAR}
- suffixD.yes.data.${YEAR}.conflict
|
- The conflict file (suffixD.yes.data.${YEAR}.conflict) lists all inconsistnent suffixD tags between SpVars in two records
- Send conflicts to linguist to tag (N|O|B) on EUI lines
- In the past, no both cases in suffixD
- Manually update the results to suffixD.tag.txt
- Rerun Steps: 3~6 until no unknown conflict (both) exist.
| 7 | - Check afflix on suffixD.yes.data.${YEAR}
CheckDerivationByAffix6.java
| - ${ALL_SRC_DIR}:
- ${SRC_DIR}:
- ${TAR_DIR}:
|
|
- copy ${SRC_DIR}/suffixD.tagYes.txt.${PREV_YEAR} ${SRC_DIR}/suffixD.tagYes.txt.${YEAR}
- suffixD.pattern3.rpt must be empty.
- This rpt lists all potential invalid dPair by checking 1st and last 3 characters on afflix.
- If not, send to linguists to tag (Yes|No):
- invalid dPair (No): add to suffixD.tagNo.txt (no used!), This should not happen!
- valid dPair (Yes): add to suffixD.tagYes.txt, then rerun Step: 7
8 | | See above | See above | Not recomended!
| Other options
|
---|
11 | - Get stats for SD-rule
ALL GetSdRuleStatsFromTaggedSuffixD.java
|
| - sdRules.stats.rpt
- sdRules.stats.detail.rpt
| Only Use for LVG SD-Rules
- Used for analysis in finding the optimal Sd-Rules set, please refer to the design documents (SD-Rules evaluation/optimization) of Lexical Tools
| 12 | - Get the HTML files
ALL GetSdRuleListHtmlFile.java
|
| - ${HTML_DIR}:
- suffixDRules.html
- SD-Examples
- SD-Exceptions
| Copy to ${LEXICON_WEB} for annually Sd-Rules updates
- SD-Examples
- SD-Exceptions
- suffixDRules.html
| |
| |