Lexical Tools

Sort Words by Order

  • Short Description: Sort words of the input term by order.

  • Full Description:

    It is to sort the words in ascending ASCII order (not dictionary order), and strip punctuations. This is useful when dealing with terms and vocabularies in which the terms may be inverted, and if they are inverted, may or may not be inverted around commas. For example, one may see the term "lung cancer"; "Cancer, Lung"; and "Cancer Lung" all refer to the same term. Word order sort will change all three of the above examples to "cancer lung".

    No effect on the -m option. "none" is added at the end of the output.

  • Difference: The Java version keeps the original case of each word from the input term after word sorting.

  • Features:
    1. Replace punctuations with spaces.
    2. Sort words in ascending ASCII order.


  • Symbol: w

  • Examples:
    
    shell> lvg -f:w
    Cancer, Lung
    Cancer, Lung|Cancer Lung|2047|16777215|w|1|
    
    Lung Cancer
    Lung Cancer|Cancer Lung|2047|16777215|w|1|
    
    More examples

  • Implementation Logic:
    1. Tokenize all words from the input term.
    2. Replace punctuations with spaces.
    3. Sort words in an ascending ASCII (case sensitive) order.

  • Source Code: ToSortWordsByOrder.java

  • Hierarchy: Object -> Transformation -> ToSortWordsByOrder