CSpell

XML/HTML Handler: Correct XML/HTML Entity

  • Description:
    This class is used to convert HTML/XML entity to ASCII.

  • Features:
    Convert the following HTML/XML entity.

    Inout
    &lt;<
    &gt;>
    &amp;&
    &quot;"
    &nbsp;

  • Examples:

    File NameInputOutput
    10058.txt&amp;&
    10715.txt&quot;?"?
    12190.txt&quot; why" why

  • Implementation Logic:
    • store the conversion in a local HashMap with key as XML/HTML entity and the value as the converted ASCII character.
    • go through all keys
      • if the input text contains key, replaced with converted ASCII character

  • Notes:
    • Baseline source code: PreProcXml.java
    • Bug fixes:
      • [& X] -> [&X]
      • [&....I] -> [&...I]
    • Action: Redesign and implemented
    • Do not convert all entities of [&#ddd;] to ASCII. Might need this conversion if they are in the input text.

  • Source Code: XmlHtmlHandler.java