XML/HTML Handler: Correct XML/HTML Entity
- Description:
This class is used to convert HTML/XML entity to ASCII.
- Features:
Convert the following HTML/XML entity.
In | out
|
---|
< | <
|
> | >
|
& | &
|
" | "
|
|
|
- Examples:
File Name | Input | Output
|
---|
10058.txt | & | &
|
10715.txt | "? | "?
|
12190.txt | " why | " why
|
- Implementation Logic:
- store the conversion in a local HashMap with key as XML/HTML entity and the value as the converted ASCII character.
- go through all keys
- if the input text contains key, replaced with converted ASCII character
- Notes:
- Baseline source code: PreProcXml.java
- Bug fixes:
- [& X] -> [&X]
- [&....I] -> [&...I]
- Action: Redesign and implemented
- Do not convert all entities of [ddd;] to ASCII. Might need this conversion if they are in the input text.
- Source Code:
XmlHtmlHandler.java