Informal Expression Handler: Correct Informal Expression
- Description:
This class is used to correct informal expression by adding an apostrophe (') to the right position or just simply converted to the formal expression.
For example, [whos] -> [who's] and [plz] -> [please]. In general, the corrected formal expression can't be found in the dictionary (who's) or with large edit-distance because they are not typo or spelling error. Instead, they are shorthand
(e.g. [plz] is a shorthand of [please] with edit-distance of 5).
- Features:
- Convert the informal expression to formal expression.
- Contraction: From baseline (original from Wikipedia)
- Shorthand: [pls] -> [please], [u] -> [you], etc.
- A configurable flat file driven conversion.
- Examples:
File Name | Input | Output
|
---|
10138.txt | u | you
|
10679.txt | b | be
|
11186.txt | ?pls | ? please
|
10.txt | im | i'm
|
11186.txt | didnt | didn't
|
16481.txt | shes | she's
|
- Implementation Logic:
- Read in the conversion from a configurable flat file and store in a local HashMap with key as informal expression and the value as the corrected expression.
- The conversion file (./data/Misc/informalExpression.data):
informal expression | correct expression
|
- Tokenize input Text to input word
- Lower case the input word
- Go through all keys
- if the input word is the key, replaced with corrected expression.
- Notes:
- Baseline source code: PreProcContractions.java
- Enhance: read in data from a configurable flat file (not hard-coded)
- Action: Redesign and implemented
- Negation might be able to corrected by dictionary (e.g.: isnt -> isn't)
- Source Code:
InformalExpHandler.java