The SPECIALIST Lexicon

Inflectional Spelling Variants

SpVars can be in the base form or inflectional forms. Most spVars are in their base forms, such as [E0017903|color|colour]. This type of spVar are recorded as "spelling_variant=colour" in the LexRecord. Some spVars does not have different spelling in the base form. However, they have different spelling on the inflectional variants. Such as [E0036645|labeling|labelling], are the inflectional variants (presPart) from the base form of "label". This type of spVars is called inflectional spelling variants. On the other hand, different inflections from different inflectional rules of a same base are not inflectional spelling variants if they have different pronunciation. Such as [E0034155|indexes|indices]. They are summarized as follows:

  • Lexicon records spVars in base forms by using the slot of [spelling_variant=filter], such as [E0017903|color|colour].
  • The LRSPL table in the Lexicon release lists all spVars in the base form. The inflections of these spVars in base forms are also spVars, such as [E0017903|colors|colours].
  • Inflectional spVars are spVars only in inflectional variants (no spVar in their base forms), such as [E0036645|labeling|labelling].
  • InflSpVar are identified by SpVar models, but not included in Lexicon. So, a software was developed to find all inflSpVars (and add to the goldStd for testing). The algorithm is briefly described as bellows:
    1. Go through all terms from inflVars.data
    2. Find terms in the same records (same EUI, category and syntax)
    3. Find terms have same inflection from above step
    4. Find terms have same base form from above step (different base form can't be inflectional spelling variants, thus [E0017903|colors|colours] is excluded)
    5. Find terms with same phonetic codes from both double Metaphone and Caverphone 2.0
    6. Following patterns with heuristic rules (for irreg) are used as shown in the follows:
      • These rules and patterns are derived based on Lexicon.2015
      • There are 29,544 terms that match step 4. 14,997 and 14,547 are identified as valid and invalid inflSpVars, respectively.

      • Inflectional Spelling Variants (14,997)

        Pattern-TypeFlagDescriptionexamples
        SPVAR_METAREG_ONLY
        (14,313)
        true Noun: metareg generates two plural forms ([-s] and [-'s]), same pronunciation
        • [E0004911|PhDs|PhD's]
        • [E050418|trachs|trach's]
        SPVAR_REGD_REG
        (357)
        true Verb: regd ([CCing], [CCed]) and reg ([Cing], [Ced]), same pronunciation. Where C is a consonant, such as l, m, n, p, r, s, t.
        • [E0014888|canceling|cancelling]
        • [E0022287|diagraming|diagramming]
        • [E0538797|caravaning|caravanning]
        • [E0025609|enveloping|envelopping]
        • [E0733532|rerefering|rereferring]
        • [E0021242|defocusing|defocussing]
        • [E0012345|benefiting|benefitting]

        • [E0014888|canceled|cancelled]
        • [E0022287|diagramed|diagrammed]
        • [E0538797|caravaned|caravanned]
        • [E0025609|enveloped|envelopped]
        • [E0733532|rerefered|rereferred]
        • [E0021242|defocused|defocussed]
        • [E0012345|benefited|benefitte]
        SPVAR_IRREG
        (323)
        true Noun or Verb: with irreg, check heuristic rules of phonetic exceptions to determine same pronunciation
        • [E0002501|Februaries|Februarys]
        • [E0007741|ageing|aging]
        • [E0014307|buffaloes|buffalos]
        • [E0014446|buses|busses]
        • [E0031605|hieing|hying]
        • [E0040758|moneys|monies]
        • [E0349824|Ponsonby's|Ponsonbys]
        • [E0529791|billies|billys]
        • [E0360756|platies|platys]
        SPVAR_METAREG_RM_PUNC
        (2)
        true Noun: Metareg ([-s]) and remove punctuation
        • [E0687269|Tab.s|Tabs.]
        • [E0687269|tab.s|tabs.]
        SPVAR_METAREG_ENDING
        (2)
        true Noun: Metareg ([-s]) and reg ([-es] on base ends with index
        • [E0630091|State Trait Anxiety Indexes|State Trait Anxiety Indexs]
        • [E0630091|State-Trait Anxiety Indexes|State-Trait Anxiety Indexs]

      • Not Inflectional Spelling Variants (14,547)

        Pattern-TypeFlagDescriptionexamples
        NOT_SPVAR_PRON
        (14,227)
        false Different phonetic codes by double Metaphone or Caverphone 2.0
        • [E0000149|Achilles bursae|Achilles bursas]
        • [E0000640|B16 melanomas|B16 melanomata]
        • [E0745849|McGoon indexes|McGoon indices]
        • [E0742176|P2X7R|P2X7R's]
        NOT_SPVAR_IRREG
        (168)
        false Noun or Verb: with irreg, check heuristic rules of phonetic exceptions to determine same pronunciation
        • [E0001026|Blackfeet|Blackfoot]
        • [E0012224|beefs|beeves]
        • [E0017300|cleaved|cleft]
        • [E0023877|drachmae|drachmai]
        • [E0027311|farther|further]
        • [E0363125|redreamed|redreamt]
        • [E0728599|Kerrison forceps|Kerrison forcipes]
        NOT_SPVAR_METAREG_BASE_S_ENDING
        (133)
        false Noun: Metareg ([-s]) and base ends with S (from plural, Invariant, Group uncount), different pronunciation

          Pluraa (101)

        • [E0001338|CRS|CRSs]
        • [E0733506|NSTEACS|NSTEACSs]

          Invariant (16)

        • [E0001320|CNS|CNSs]
        • [E0741411|ECRIS|ECRISs]

          Group uncount (16)

        • [E0000059|ACS|ACSs]
        • [E0724307|ASES|ASESs]
        NOT_SPVAR_GLREG
        (12)
        false Noun: glreg pattern, different pronunciation
        • [E0055224|sensilla|sensillae]
        • [E0400002|ehrlichioses|ehrlichiosis]
        NOT_SPVAR_REGD_NO_REG
        (4)
        false Verb: regd without reg (pt or pp are from irreg), different pronunciation
        • [E0057257|spat|spit]
        • [E0724977|shat|shit]
        NOT_SPVAR_METAREG_ENDING
        (2)
        false Noun: Metareg ([-s]) and irreg (ends without s), different pronunciation
        • [E0581864|secretory phospholipase A(2)s|secretory phospholipases A(2)]
        • [E0581864|secretory phospholipase A2s|secretory phospholipases A2]
        NOT_SPVAR_MODAL
        (1)
        false Modal: negative, different pronunciation
        • [E0014877|can't|cannot]