Generate Derivational Variants
Derivational variants are terms which are related by a derivational process. In linguistics, a derivational process is used to create new words on the basis of existing words (root). A derivation derives a new word from an existing word by adding or removing an affix (prefix or suffix) to it. Through this process, meaning or/and category might change. For example, "un-happy" and "happi-ness" are from "happy" by prefix and suffix, respectively. In Lexical tools, derivational pairs define direct derivations. A derivational pair (dPair) includes two derivational related terms (include base form and category) if and only if they are 1 derivational step. It is bi-directional (symmetrical). Only one affix is allowed in a derivational pair. For example, "un-happi-ness" is not a direct derivation of "happy", instead, it is a derivation of "unhappy". Accordingly, "un-happi-ness" and "happy" do not compose a derivational pair. Derivations with more than 1 derivational step can't be retrieved by this flow, such as happily and (-> happy ->) happiness. However, they can be retrieved by recursive derivational flow (-f:R). Often, the derivational variant changes syntactic category from the original term. Derivational variants are generated by FACTs: a case insensitive lookup in a database table of known derivations, or RULEs (SD-Rules): by adding, changing or removing common suffixes (case sensitive).
In 2012, a systematic approach was used to add prefix and zero (same spelling with different category) derivations to FACTs. In 2013, a systematic approach was used to add suffix derivations to FACTs. In addition, filter options with derivation types (zeroD, suffixD, prefixD) and negations (negative derivational pairs, such as happy and unhappy, effort and effortless) are added. In 2014, 63 more prefixes were added to generate prefixD pairs. All dPairs from previous (before 2012) Facts were also validated and added (if the dPair has EUIs, valid, and not duplicated). In addition, the optimized SD-Rule set is integrated as default in Trie to reach above 95% of precision and recall rate. In 2015, 2 new prefixes and 13 new SD-Rules are added to this system to cover more derivations in this system. In 2016, 2 new prefixes and 11 new SD-Rules are added to this system to cover more derivations in this system. Please refer details to the derivational variants design documents. The default filter options for derivations are:
Default Filter Options | Descriptions |
---|---|
-kd:1 | restriction to FACTs only |
-kdt:ZSP | derivation types are zeroD, suffixD, and prefixD |
-kdn:O | negations are otherwise (non-negative) |
Derivational variants are generated by FACTs (a pre-computed derivational table) and morphology rules (RULEs). Facts are stored in database and retrieved by SQL query. RULEs are stored and retrieved through Trie mechanism. There are two new heuristic rules implemented in the Java version to filter out non-realistic derivational variants generated by rules. They are governed by:
For example,
RULE|ic$|adj|base|y$|noun|base
The values of above two variables are configurable in the configuration tool (${LVG_DIR}/data/config/lvg.properties). The default value are 3 and 3 for both Min. length of a term (MIN_TERM_LENGTH) and Min. length of stem in trie tree (DIR_TRIE_STEM_LENGTH), respectively.
Results from both FACTs and RULEs are combined, sorted, then filter out those with same output terms, output category and input category. Finally, a derivational flow specific filter options (-kd:int) is applied. These options include known to LEXICON only (default: 1), known to LEXICON or all (2), and all (3).
The -m flag is used to display the additional information that can be retrieved with the derivation flow. The additional information consists of two parts:
FACT|D-1|CAT-1|EUI-1|D-2|CAT-2|EUI-2|D-Type|Negation|prefix|
RULE|suffix-1|CAT-1|base|suffix-2|CAT-2|base|
Please notes that only suffix rules are applied in derivations.
shell> lvg -f:d -m multiple multiple|multiplicity|128|1|d|1|FACT|multiple|1|E0041326|multiplicity|128|E0041348|S|O|None| multiple|multiply|2|1|d|1|FACT|multiple|1|E0041326|multiply|2|E0041350|S|O|None| multiple|multiple|128|1|d|1|FACT|multiple|1|E0041326|multiple|128|E0041327|Z|O|None| multiple|multiply|1024|1|d|1|FACT|multiple|1|E0041326|multiply|1024|E0041349|S|O|None| multiple|pseudomultiple|1|1|d|1|FACT|multiple|1|E0041326|pseudomultiple|1|E0620850|P|O|pseudo| multiple|pseudo-multiple|1|1|d|1|FACT|multiple|1|E0041326|pseudo-multiple|1|E0620850|P|O|pseudo-| multiple|submultiple|128|1|d|1|FACT|multiple|128|E0041327|submultiple|128|E0224586|P|O|sub| multiple|multiple|1|1|d|1|FACT|multiple|128|E0041327|multiple|1|E0041326|Z|O|None| help help|helper|128|1|d|1|FACT|help|128|E0031061|helper|128|E0031062|S|O|None| help|helpful|1|1|d|1|FACT|help|128|E0031061|helpful|1|E0031066|S|O|None| help|helper|128|1|d|1|FACT|help|1024|E0031060|helper|128|E0031062|S|O|None| help|helping|128|1|d|1|FACT|help|1024|E0031060|helping|128|E0219271|S|O|None| help|help|1024|1|d|1|FACT|help|128|E0031061|help|1024|E0031060|Z|O|None| help|self-help|128|1|d|1|FACT|help|128|E0031061|self-help|128|E0055088|P|O|self-| help|help|128|1|d|1|FACT|help|1024|E0031060|help|128|E0031061|Z|O|None| happy happy|happily|2|1|d|1|FACT|happy|1|E0030812|happily|2|E0218480|S|O|None| happy|happiness|128|1|d|1|FACT|happy|1|E0030812|happiness|128|E0030811|S|O|None| shell> lvg -f:d -m -kdn:O+N happy happy|unhappy|1|1|d|1|FACT|happy|1|E0030812|unhappy|1|E0063156|P|N|un| happy|happily|2|1|d|1|FACT|happy|1|E0030812|happily|2|E0218480|S|O|None| happy|happiness|128|1|d|1|FACT|happy|1|E0030812|happiness|128|E0030811|S|O|None|More examples