Text Categorization

STRI Legal Word Filter Options

Legal word filter is one of the processes in the input filter option. STRI input filter uses legal word filter to filter out not legal words. This legal word option provides users to set the legal word length, restrictwords, stopwords, document count, word count, and normalized signal according to their preference. This option is designed for advanced users for their special interests. General users should skip this option and use the default setting.

The table below lists all legal words filter options of STRI:

  • Legal Word filter option

    option flagfeature descriptions
    -lw:d Show legal words filter option details
    -lw:dc~INT Set valve of min document count (default:2)
    -lw:dc~u Use min document count criteria (default: not use)
    -lw:h Show legal words filter help menu
    -lw:hs~INT Set value of high signal (default: 754648)
    -lw:hs~n Not use high signal criteria (default: use)
    -lw:ls~INT Set valve of low signal (default:2)
    -lw:ls~n Not use low signal criteria (default: use)
    -lw:r Remove restrictwords filter
    -lw:s keep stopwords (not use stopword filter)
    -lw:wc~INT Set valve of min word count (default:2)
    -lw:wc~u Use min word count criteria (default: not use)
    -lw:wl~INT Set valve of min. word length (default: 3)
    -lw:wl~n Not use min. word length criteria

  • Examples:
    • Index input and show prompt with input filter and legal word filter details > stri -if:d -lw:d -p - Please input a term (type "Ctl-d" to quit) > virus --> Input: [virus] ------ Input Filter Details ------ --> Input text: [virus] -- Words after Acronym filter [virus], Acronym filter is not used. -- W.E. filtered words (1): [virus], W.E. filter is used -- Legal words (1): [virus] --- Legal words selected options: - Min. length: true (3) - Remove stopwords: true - Restrictwords only: true - Min. normalized count: true (2) - Max. normalized count: true (792054) - Min. WC: false (2) - Min. DC: false (2) - Illegal words details: -- Unique words (1): [virus], unique word filter is not used -- Final words (1): [virus] -- Number of scores: 130 -- Total final words used: 1 --- ST scores (x 1) and rank based on word count --- virs|T005|Virus 1|0.9690|virs|T005|Virus 2|0.4725|nnon|T114|Nucleic Acid, Nucleoside, or Nucleotide 3|0.3859|amas|T087|Amino Acid Sequence 4|0.3246|mamm|T015|Mammal 5|0.3133|lbpr|T059|Laboratory Procedure 6|0.3073|genf|T045|Genetic Function 7|0.2846|imft|T129|Immunologic Factor 8|0.2519|aapp|T116|Amino Acid, Peptide, or Protein 9|0.2516|clas|T185|Classification 10|0.2502|acty|T052|Activity --- ST scores (x 1) and rank based on document count --- virs|T005|Virus 1|0.9402|virs|T005|Virus 2|0.5831|nnon|T114|Nucleic Acid, Nucleoside, or Nucleotide 3|0.4692|amas|T087|Amino Acid Sequence 4|0.4003|imft|T129|Immunologic Factor 5|0.3846|genf|T045|Genetic Function 6|0.3760|lbpr|T059|Laboratory Procedure 7|0.3696|mamm|T015|Mammal 8|0.3552|acty|T052|Activity 9|0.3372|clas|T185|Classification 10|0.3327|aapp|T116|Amino Acid, Peptide, or Protein --- Overall ST rank --- virs|T005|Virus|dc