Installation Instructions
Thank you for downloading the Java Text Categorization Tools Java. This package includes one compressed file -- tc2011.tgz.
Download the tc2011.tgz file from Text Categorization Tool web site
Uncompress and unarchive this file into the location where you intend to have it installed. On a Linux machine, this would look like:
> gtar -xzvf tc2011.tgz
If you are on a Linux platform, gunzip and tar may be used to uncompress and unarchive this file. This would look like:
> gunzip tc2011.tgz > tar -xvf tc2011.tar
If you are on a windows platform, pkzip and winzip may be used to uncompress and unarchive this file. Please make sure the file you download has the right extension (tgz) before you unzip it. Winzip looks into the file to check if there is tar file within it, and it asks if it should create a temporary file with the tar file in it. The proper response is to allow this to happen. Once the temporary file has been created, winzip reads it and displays the archive as it would any other zip'ed file. Unzip the resulting contents to the location you want to install to. Make sure that you preserve the directory structures when unarchiving.
Once the files are in place, change directories to the top level. This should be tc2011. If you are on a windows platform, this will involve opening up a DOS command window (Start->run->cmd), and changing directories to where you have put these files. We will henceforth refer to this top level directory as the TC_DIR directory.
From the TC_DIR, invoke the appropriate installation command. These are installation scripts appropriate to the platforms we know about:
Platforms | Installation Scripts |
Linux i586 | install/bin/install_linux.sh |
Windows | install\bin\install_win.bat |
The scripts shown above will install a copy of the JRE, then proceed to alter the configuration file settings to set the install location. If you install tc on a PC platform, choose NOT to restart your computer after JRE is installed. You may restart your computer after finishing tc installation.
The script will create shell files in the ${TC_DIR}/bin directory for each of the text categorization tools, with the proper environment set up. Also, it creates tc configuration file under ${TC_DIR}/data/config/tc.properties.
Once done, the script will attempt to verify that the installation was successful.
If the installation was successful, there will be a congratulation message that finishes off this process.
If the installation is not successful, there will be a message indicating so. Please refer to the ${TC_DIR}/logs directory for a complete transcript of the process. It is likely that there were error messages indicating the source of the failure.
After a successful installation
The text Categorization tools are now ready to be used. They can be found in the ${TC_DIR}/bin directory. On the Linux platforms, they include the following shell scripts:
On the Windows platform, they include the following batch files:
Each may be used from any location on the machine. If you put the ${TC_DIR}/bin path in your $PATH environment variable they do not have to be referenced with the whole path.
> jdi -p - Please input a term (type "Ctl-d" to quit) > heart valve --> Input: [heart valve] --- JD scores (x 1) and rank based on word count --- JD018|Cardiology 1|0.0858526|JD018|Cardiology 2|0.0624434|JD148|Pulmonary Medicine 3|0.0495025|JD124|Vascular Diseases 4|0.0251979|JD144|General Surgery 5|0.0209033|JD030|Diagnostic Imaging 6|0.0108041|JD120|Transplantation 7|0.0090153|JD005|Anesthesiology 8|0.0086425|JD014|Biomedical Engineering 9|0.0067363|JD100|Radiology 10|0.0064961|JD118|Therapeutics --- JD scores (x 1) and rank based on document count --- JD018|Cardiology 1|0.1564322|JD018|Cardiology 2|0.0979494|JD148|Pulmonary Medicine 3|0.0891969|JD124|Vascular Diseases 4|0.0438102|JD030|Diagnostic Imaging 5|0.0400007|JD144|General Surgery 6|0.0236169|JD005|Anesthesiology 7|0.0187880|JD120|Transplantation 8|0.0158293|JD014|Biomedical Engineering 9|0.0151241|JD092|Physiology 10|0.0133293|JD118|Therapeutics --- Overall JD rank --- JD018|Cardiology|dc
Altering your environment to use the text categorization tools, once they have been installed
You may invoke these tools from a command line. These tools are found in the ${TC_DIR}/bin directory. You can add this ${TC_DIR}/bin path to your $PATH environment variable. This would enable you to find and run these tools from any location.
In Linux, this would be done by adding this path to your ~/.cshrc or ~/.profile startup script.
In Windows, this would be done by appending this path to the PATH variable from the "control panel\System\Advanced\Environment variable\Edit" menus.
Manual installation
Manual installation page shows details of how the installation script configures and installs Text Categorization tools. Users may skip this section if they use installation script to install text categorization tools.