Text Categorization

WORD_ST_SCORES Table

  • I. Db Table

    WORD_ST_SCORES
    NameTypePropertiesNotes
    wordVARCHAR(50)Index, NOT NULLWords from Journals (lower case)
    tuiVARCHAR(5)Index, NOT NULLSemantic Types ID
    wordScoreFLOAT scores based on word frequency
    documentScoreFLOAT scores based on document count for words

  • II. SQL Examples
    • Find scores for WORD, "XXX"
      1. "xxx" = "XXX".toLowerCase( );
      2. SELECT tui, wordScore, documentScore FROM WORD_ST_SCORES WHERE word= 'xxx';

  • III. Notes
    • When load data of this table into HSqlDb, the cache data file size is exceed the limit (> 2GB). To fix this problem, we should change the cache file value in the HSqlDb configuration file as follows:
      • under HSqlDb/tc${YEAR}.properties
      • hsqldb.cache_file_scale=1 (default, 2GB) to hsqldb.cache_file_scale=8 (8 Gb)