Lexical Tools

Lvg: 2005~2007 strip diacritics

Strip Diacritics:

  • Strip Diacritics using Unicode Normalization D:

    As discussed in the Unicode Normalization, Normalization D can be used for stripping diacritics. A list of sample diacritics, which are stripped by this method, are shown as the follows.

    Numeric EntityUnicode Symbol Description Stripped Character
    192 \u00c0À Capital A, grave accent A
    193 \u00c1Á Capital A, acute accent A
    194 \u00c2Â Capital A, circumflex accent A
    195 \u00c3Ã Capital A, tilde A
    196 \u00c4Ä Capital A, umlaut A
    197 \u00c5Å Capital A, ring A
    199 \u00c7Ç Capital C, cedilla C
    200 \u00c8È Capital E, grave accent E
    201 \u00c9É Capital E, acute accent E
    202 \u00caÊ Capital E, circumflex accent E
    203 \u00cbË Capital E, umlant E
    204 \u00ccÌ Capital I, grave accent I
    205 \u00cdÍ Capital I, acute accent I
    206 \u00ceÎ Capital I, circumflex accent I
    207 \u00cfÏ Capital I, umlant I
    209 \u00d1Ñ Capital N, tilde N
    210 \u00d2Ò Capital O, grave accent O
    211 \u00d3Ó Capital O, acute accent O
    212 \u00d4Ô Capital O, circumflex accent O
    213 \u00d5Õ Capital O, tilde O
    214 \u00d6Ö Capital O, umlaut O
    217 \u00d9Ù Capital U, grave accent U
    218 \u00daÚ Capital U, acute accent U
    219 \u00dbÛ Capital U, circumflex accent U
    220 \u00dcÜ Capital U, umlaut U
    221 \u00ddÝ Capital Y, acute accent Y
    224 \u00e0à Small A, grave accent a
    225 \u00e1á Small A, acute accent a
    226 \u00e2â Small A, circumflex accent a
    227 \u00e3ã Small A, tilde a
    228 \u00e4ä Small A, umlaut a
    229 \u00e5å Small A, ring a
    231 \u00e7ç Small c, cedilla c
    232 \u00e8è Small e, grave accent e
    233 \u00e9é Small e, acute accent e
    234 \u00eaê Small e, circumflex accent e
    235 \u00ebë Small e, umlant e
    236 \u00ecì Small i, grave accent i
    237 \u00edí Small i, acute accent i
    238 \u00eeî Small i, circumflex accent i
    239 \u00efï Small i, umlant i
    241 \u00f1ñ Small n, tilde n
    242 \u00f2ò Small o, grave accent o
    243 \u00f3ó Small o, acute accent o
    244 \u00f4ô Small o, circumflex accent o
    245 \u00f5õ Small o, tilde o
    246 \u00f6ö Small o, umlaut o
    249 \u00f9ù Small u, grave accent u
    250 \u00faú Small u, acute accent u
    251 \u00fbû Small u, circumflex accent u
    252 \u00fcü Small u, umlaut u
    253 \u00fdý Small y, acute accent y
    255 \u00ffÿ Small y, umlaut y

  • Strip diacritics by user's definition:

    Users may define their own diacritics stripping. The current default definitions in Lvg are shown in follows:

    Numeric EntityUnicode Symbol Description Stripped Character
    216 \u00d8Ø Latin Capital Letter O With Stroke O
    248 \u00f8ø Latin Small Letter O With Stroke o