CSpell

Informal Expression Handler: Correct Informal Expression

  • Description:
    This class is used to correct informal expression by adding an apostrophe (') to the right position or just simply converted to the formal expression. For example, [whos] -> [who's] and [plz] -> [please]. In general, the corrected formal expression can't be found in the dictionary (who's) or with large edit-distance because they are not typo or spelling error. Instead, they are shorthand (e.g. [plz] is a shorthand of [please] with edit-distance of 5).

  • Features:
    • Convert the informal expression to formal expression.
      • Contraction: From baseline (original from Wikipedia)
      • Shorthand: [pls] -> [please], [u] -> [you], etc.
    • A configurable flat file driven conversion.

  • Examples:

    File NameInputOutput
    10138.txtuyou
    10679.txtbbe
    11186.txt?pls? please
    10.txtimi'm
    11186.txtdidntdidn't
    16481.txtshesshe's

  • Implementation Logic:
    • Read in the conversion from a configurable flat file and store in a local HashMap with key as informal expression and the value as the corrected expression.
    • The conversion file (./data/Misc/informalExpression.data):

      informal expressioncorrect expression
    • Tokenize input Text to input word
    • Lower case the input word
    • Go through all keys
      • if the input word is the key, replaced with corrected expression.

  • Notes:
    • Baseline source code: PreProcContractions.java
    • Enhance: read in data from a configurable flat file (not hard-coded)
    • Action: Redesign and implemented
    • Negation might be able to corrected by dictionary (e.g.: isnt -> isn't)

  • Source Code: InformalExpHandler.java