public class ToStripDiacritics extends Transformation implements java.lang.Cloneable
Diacritic chractrers are in ISO Latin I character set. In other words, it is Unicode Latin-1 supplement block (U+0080 ~ U+00FF). It also in other unicode blocks, such as Latin Extend-A and Latin Extend-B. The diacritics mapping list is configurable by modifying the configuration file (${LVG}/data/Unicode/diacriticMap.data).
History:
NO_MUTATE_INFO, UPDATE| Constructor and Description |
|---|
ToStripDiacritics() |
| Modifier and Type | Method and Description |
|---|---|
static java.util.Hashtable<java.lang.Character,java.lang.Character> |
GetDiacriticMapFromFile(Configuration config)
read in diacritics mapping list from configuration file
|
static void |
main(java.lang.String[] args)
A unit test driver for this flow component.
|
static java.util.Vector<LexItem> |
Mutate(LexItem in,
java.util.Hashtable<java.lang.Character,java.lang.Character> diacriticMap,
boolean detailsFlag,
boolean mutateFlag)
Performs the mutation of this flow component.
|
static char |
StripDiacritic(char inChar,
java.util.Hashtable<java.lang.Character,java.lang.Character> diacriticMap)
Strip diacritic for an input character
|
static java.lang.String |
StripDiacritics(java.lang.String inStr,
java.util.Hashtable<java.lang.Character,java.lang.Character> diacriticMap)
Strip diacritic for an input string
|
GetTestStr, PrintResult, PrintResults, UpdateLexItem, UpdateLexItem, UpdateLexItempublic static java.util.Vector<LexItem> Mutate(LexItem in, java.util.Hashtable<java.lang.Character,java.lang.Character> diacriticMap, boolean detailsFlag, boolean mutateFlag)
in - a LexItem as the input for this flow componentdiacriticMap - a hash table contain the mapping of diacriticsdetailsFlag - a boolean flag for processing details informationmutateFlag - a boolean flag for processing mutate informationpublic static java.util.Hashtable<java.lang.Character,java.lang.Character> GetDiacriticMapFromFile(Configuration config)
config - Configuratin objectpublic static char StripDiacritic(char inChar,
java.util.Hashtable<java.lang.Character,java.lang.Character> diacriticMap)
inChar - input character for stripping diacriticdiacriticMap - user defined diacritics mappingpublic static java.lang.String StripDiacritics(java.lang.String inStr,
java.util.Hashtable<java.lang.Character,java.lang.Character> diacriticMap)
inStr - input string for stripping diacriticdiacriticMap - user defined diacritics mappingpublic static void main(java.lang.String[] args)
args - arguments