Lvg: 2005~2007 strip diacritics
Strip Diacritics:
As discussed in the Unicode Normalization, Normalization D can be used for stripping diacritics. A list of sample diacritics, which are stripped by this method, are shown as the follows.
Numeric Entity | Unicode | Symbol | Description | Stripped Character |
192 | \u00c0 | À | Capital A, grave accent | A |
193 | \u00c1 | Á | Capital A, acute accent | A |
194 | \u00c2 | Â | Capital A, circumflex accent | A |
195 | \u00c3 | Ã | Capital A, tilde | A |
196 | \u00c4 | Ä | Capital A, umlaut | A |
197 | \u00c5 | Å | Capital A, ring | A |
199 | \u00c7 | Ç | Capital C, cedilla | C |
200 | \u00c8 | È | Capital E, grave accent | E |
201 | \u00c9 | É | Capital E, acute accent | E |
202 | \u00ca | Ê | Capital E, circumflex accent | E |
203 | \u00cb | Ë | Capital E, umlant | E |
204 | \u00cc | Ì | Capital I, grave accent | I |
205 | \u00cd | Í | Capital I, acute accent | I |
206 | \u00ce | Î | Capital I, circumflex accent | I |
207 | \u00cf | Ï | Capital I, umlant | I |
209 | \u00d1 | Ñ | Capital N, tilde | N |
210 | \u00d2 | Ò | Capital O, grave accent | O |
211 | \u00d3 | Ó | Capital O, acute accent | O |
212 | \u00d4 | Ô | Capital O, circumflex accent | O |
213 | \u00d5 | Õ | Capital O, tilde | O |
214 | \u00d6 | Ö | Capital O, umlaut | O |
217 | \u00d9 | Ù | Capital U, grave accent | U |
218 | \u00da | Ú | Capital U, acute accent | U |
219 | \u00db | Û | Capital U, circumflex accent | U |
220 | \u00dc | Ü | Capital U, umlaut | U |
221 | \u00dd | Ý | Capital Y, acute accent | Y |
224 | \u00e0 | à | Small A, grave accent | a |
225 | \u00e1 | á | Small A, acute accent | a |
226 | \u00e2 | â | Small A, circumflex accent | a |
227 | \u00e3 | ã | Small A, tilde | a |
228 | \u00e4 | ä | Small A, umlaut | a |
229 | \u00e5 | å | Small A, ring | a |
231 | \u00e7 | ç | Small c, cedilla | c |
232 | \u00e8 | è | Small e, grave accent | e |
233 | \u00e9 | é | Small e, acute accent | e |
234 | \u00ea | ê | Small e, circumflex accent | e |
235 | \u00eb | ë | Small e, umlant | e |
236 | \u00ec | ì | Small i, grave accent | i |
237 | \u00ed | í | Small i, acute accent | i |
238 | \u00ee | î | Small i, circumflex accent | i |
239 | \u00ef | ï | Small i, umlant | i |
241 | \u00f1 | ñ | Small n, tilde | n |
242 | \u00f2 | ò | Small o, grave accent | o |
243 | \u00f3 | ó | Small o, acute accent | o |
244 | \u00f4 | ô | Small o, circumflex accent | o |
245 | \u00f5 | õ | Small o, tilde | o |
246 | \u00f6 | ö | Small o, umlaut | o |
249 | \u00f9 | ù | Small u, grave accent | u |
250 | \u00fa | ú | Small u, acute accent | u |
251 | \u00fb | û | Small u, circumflex accent | u |
252 | \u00fc | ü | Small u, umlaut | u |
253 | \u00fd | ý | Small y, acute accent | y |
255 | \u00ff | ÿ | Small y, umlaut | y |
Users may define their own diacritics stripping. The current default definitions in Lvg are shown in follows:
Numeric Entity | Unicode | Symbol | Description | Stripped Character |
216 | \u00d8 | Ø | Latin Capital Letter O With Stroke | O |
248 | \u00f8 | ø | Latin Small Letter O With Stroke | o |