The SPECIALIST Lexicon

orgD Report in 2014 release

Requested by users, we manully added valid dPairs from orgD (original dPairs from original Facts) into derivation table for 2014 release. The stps including add EUIs to orgD (in the Lexicon), add negation|dType|prfix. Lots of these orgD with EUI are duplicated from suffixD, prefixD,a dn zeroD. Only those are not cwknown from previous steps need to be added. They include suffixD.TBD, prefixD.TBD, zeroD.TBD, and type-U (unknown type). Ideally, all valid dPairs from orgD should be automatically generated by our new derivations gegeneration processes by adding:

  • more prefix for prefixD (no prefixD found affter 2015+)
  • SD candiddate rules for suffixD
  • No new zeroD should be found because our system should cover all possible zeroD (please notes that acronyms or abbreviations can't be zeroD).
o
Please notes that add/delete spVars or nominalizatin may cause new/conflict orgD.TBD from above.

Bellows are the detail breakdown:

  • The orgD are from the following 5 files:
    • convers.fct
    • dm.fct
    • etc.fct
    • nomiz.fct
    • pd.fct

  • The above 5 files are combined into orgD.raw.data
    • Total line: 10,763 (orgD.raw.data)
      • Comment No: 6,229
      • Empty No (empty line): 59
      • dPair No: 4,475
        • Duplicate No: 2
          sulphurise|verb|sulfurization|nou
          sulphurize|verb|sulfurization|noun
        • Unique dPair No: 4473 (orgD.yes.data + 1 line is empty)
          => This file is used to to MetaMap BDB tables
          => This file is further modified to orgD.yes.data.final (4,467) by:
          • removing invalid dPairs as shown in follows:
            apical|adj|apex|noun
            lend|verb|loan|noun
            neurotic|adj|nerve|noun
            ovigerous|adj|ova|noun
            puric|adj|pus|noun
            uretic|adj|urine|noun
          • modifying dPairs as shown in follows:
            heamolyse|verb|hemolysis|noun => haemolyse|verb|hemolysis|noun
            heamolyze|verb|hemolysis|noun => haemolyze|verb|hemolysis|noun
            oxidize|verb|oxygen|noun => oxidize|verb|oxide|noun
            pliable|adj|ply|noun => pliable|adj|ply|verb
            pliant|adj|ply|noun => pliant|adj|ply|verb

    • orgD.yes.data.final (4,467)
      • add dType (P|S|Z|PS|SS|ZS|U),
      • auto tag (yes|no) from tagged file
      • review those are not tagged

      The output files and their counts are:
      • orgD.yes.data.final.yesEui.type.P: 4
        • orgD.yes.data.final.yesEui.type.P.meta: 4
          • orgD.yes.data.final.yesEui.type.P.no.data: 1
          • orgD.yes.data.final.yesEui.type.P.yes.data: 0
          • orgD.yes.data.final.yesEui.type.P.tbt.data: 0
          • orgD.yes.data.final.yesEui.type.P.tbd.data: 3
            => manually review and tag, add "yes" dPairs to prefixD
      • orgD.yes.data.final.yesEui.type.S: 3,549
        • orgD.yes.data.final.yesEui.type.S.meta: 3,549
          • orgD.yes.data.final.yesEui.type.S.no.data: 2
          • orgD.yes.data.final.yesEui.type.S.yes.data: 1,068
          • orgD.yes.data.final.yesEui.type.S.tbd.data: 2,479
            => manually review and tag, add "yes" dPairs to suffixD
      • orgD.yes.data.final.yesEui.type.Z: 220
        • orgD.yes.data.final.yesEui.type.Z.meta: 220
          • orgD.yes.data.final.yesEui.type.Z.no.data: 16
          • orgD.yes.data.final.yesEui.type.Z.yes.data: 204
          • orgD.yes.data.final.yesEui.type.Z.tbd.data: 0
            => manually review and tag, add "yes" dPairs to zeroD

      • orgD.yes.data.final.yesEui.type.PS: 0
      • orgD.yes.data.final.yesEui.type.SS: 47
      • orgD.yes.data.final.yesEui.type.ZS: 17
        The above three files are dPairs caused by SpVars without matching chracters. They are excluded in derivational tables.

      • orgD.yes.data.final.yesEui.type.U: 91
        => manually review to dType and dTag (most of these should be suffixD with case difference, some of them are zeroD without SpVars), add "yes" dPairs to the associated dPair type.