Text Categorization

Load To TC Database

There are 2 steps to load data from files to Database for STI. They are detailed as follows:

  • Setup input data:
    • Directory:
      • Source: /nfsvol/crfiler-lex/Development/TC/2008a/data/2008X
    • Files:
      • ${TC_DIR}/data/Jdi/MhJdTable.txt
      • ${TC_DIR}/data/Jdi/ShJdTable.txt
      • ${TC_DIR}/data/Jdi/WordJdTable.txt

      • ${TC_DIR}/data/Sti/WordStTable.txt

      • ${TC_DIR}/data/Config/tc.properties
        Use DB_NAME to specify the database name

  • Load data from files to Database:
    • Directory:
      • cd ${TC_DIR}/bin/loadDb
    • Procedures:
      • Step 1: Create database
        1.CreateDb
      • Step 2: Increase cache size
        Modify "hsqldb.cache_file_scale=8" in ${TC_DIR}/data/HSqlDb/tc${YEAR}.properties
      • Step 3: Load Data to database
        3.LoadDb &{YEAR}
        • 1: 10 Min.
        • 2: 1 min.
        • 3: 30 Sec.
        • 4: 90 Min.
      • Step 4: Change to read only
        Modify "readonly=true" in ${TC_DIR}/data/HSqlDb/tc${YEAR}.properties

  • Backup database:
    • Backup database:
      • cd ${TC_DIR}/data
      • gtar -czvf /nfsvol/crfiler-lex/Development/TC/2008a/HSqlDbs/HSqlDb.${YEAR}.tgz HSqlDb.${YEAR}
    • Move Data Files:
      • cd ${TC_DIR}/data/HSqlDb.${YEAR}
      • mv * /export/home/lu/Projects/TC/tc2009/data/HSqlDb/.
      • cd ..
      • rm -f HSqlDb.${YEAR}