orgD Report in 2014 release
Requested by users, we manully added valid dPairs from orgD (original dPairs from original Facts) into derivation table for 2014 release. The stps including add EUIs to orgD (in the Lexicon), add negation|dType|prfix. Lots of these orgD with EUI are duplicated from suffixD, prefixD,a dn zeroD. Only those are not cwknown from previous steps need to be added. They include suffixD.TBD, prefixD.TBD, zeroD.TBD, and type-U (unknown type).
Ideally, all valid dPairs from orgD should be automatically generated by our new derivations gegeneration processes by adding:
- more prefix for prefixD (no prefixD found affter 2015+)
- SD candiddate rules for suffixD
- No new zeroD should be found because our system should cover all possible zeroD (please notes that acronyms or abbreviations can't be zeroD).
o
Please notes that add/delete spVars or nominalizatin may cause new/conflict orgD.TBD from above.
Bellows are the detail breakdown:
- The orgD are from the following 5 files:
- convers.fct
- dm.fct
- etc.fct
- nomiz.fct
- pd.fct
- The above 5 files are combined into orgD.raw.data
- Total line: 10,763 (orgD.raw.data)
- Comment No: 6,229
- Empty No (empty line): 59
- dPair No: 4,475
- Duplicate No: 2
sulphurise|verb|sulfurization|nou
sulphurize|verb|sulfurization|noun
- Unique dPair No: 4473 (orgD.yes.data + 1 line is empty)
=> This file is used to to MetaMap BDB tables
=> This file is further modified to orgD.yes.data.final (4,467) by:
- removing invalid dPairs as shown in follows:
apical|adj|apex|noun
lend|verb|loan|noun
neurotic|adj|nerve|noun
ovigerous|adj|ova|noun
puric|adj|pus|noun
uretic|adj|urine|noun
- modifying dPairs as shown in follows:
heamolyse|verb|hemolysis|noun
=> haemolyse|verb|hemolysis|noun
heamolyze|verb|hemolysis|noun
=> haemolyze|verb|hemolysis|noun
oxidize|verb|oxygen|noun
=> oxidize|verb|oxide|noun
pliable|adj|ply|noun
=> pliable|adj|ply|verb
pliant|adj|ply|noun
=> pliant|adj|ply|verb
- orgD.yes.data.final (4,467)
- add dType (P|S|Z|PS|SS|ZS|U),
- auto tag (yes|no) from tagged file
- review those are not tagged
The output files and their counts are:
- orgD.yes.data.final.yesEui.type.P: 4
- orgD.yes.data.final.yesEui.type.P.meta: 4
- orgD.yes.data.final.yesEui.type.P.no.data: 1
- orgD.yes.data.final.yesEui.type.P.yes.data: 0
- orgD.yes.data.final.yesEui.type.P.tbt.data: 0
- orgD.yes.data.final.yesEui.type.P.tbd.data: 3
=> manually review and tag, add "yes" dPairs to prefixD
- orgD.yes.data.final.yesEui.type.S: 3,549
- orgD.yes.data.final.yesEui.type.S.meta: 3,549
- orgD.yes.data.final.yesEui.type.S.no.data: 2
- orgD.yes.data.final.yesEui.type.S.yes.data: 1,068
- orgD.yes.data.final.yesEui.type.S.tbd.data: 2,479
=> manually review and tag, add "yes" dPairs to suffixD
- orgD.yes.data.final.yesEui.type.Z: 220
- orgD.yes.data.final.yesEui.type.Z.meta: 220
- orgD.yes.data.final.yesEui.type.Z.no.data: 16
- orgD.yes.data.final.yesEui.type.Z.yes.data: 204
- orgD.yes.data.final.yesEui.type.Z.tbd.data: 0
=> manually review and tag, add "yes" dPairs to zeroD
- orgD.yes.data.final.yesEui.type.PS: 0
- orgD.yes.data.final.yesEui.type.SS: 47
- orgD.yes.data.final.yesEui.type.ZS: 17
The above three files are dPairs caused by SpVars without matching chracters. They are excluded in derivational tables.
- orgD.yes.data.final.yesEui.type.U: 91
=> manually review to dType and dTag (most of these should be suffixD with case difference, some of them are zeroD without SpVars), add "yes" dPairs to the associated dPair type.