Quick Guide:
SUPPORTED FILE FORMATS:
The SKR/MetaMap system requires as input: An ASCII file, and the file must
be formatted in one of the formats listed below. For the best results, we recommend the
first format "MEDLINE". The MEDLINE format is what the SKR/MetaMap program was initially
built around and is still the best supported of all the formats.
It should also be noted that it is always better to lump more
items into a single file and submit that to the Scheduler and let it do the
distribution for you. Instead, if you submit a larger number of smaller files with
fewer entries, it forces the Scheduler to swap more and slows things down.
Note: Please also note that the Scheduler does not support
non-ASCII characters. If your file has Unicode or UTF-8 character set characters, it
will likely cause an error.
Note: If you are going to send
free format text, please break your text into smaller chunks to run through the
Scheduler. Large chunks of text take too long to process via the Scheduler. As a rule of
thumb, we typically break free form text into chunks of around 2,000 - 3,000 characters.
-
MEDLINE format with a blank line separating each item to be processed.
Use of "PMID-" as an identifier tag is supported by all applications.
- Free format with a blank
line separating each item to be processed.
- Single Line Delimited Input
NOTE: You MUST select "Single Line Delimited
Input" from the list of "Scheduler Specific Options" on the various
submission pages for this to work.
- Single Line Delimited Input w/ ID
NOTE: You MUST select "Single Line Delimited
Input w/ ID" from the list of "Scheduler Specific Options" on the
various submission pages for this to work. This option
assumes a two field input line: "ID|text to be processed". The ID can be a
combination of any alpha-numeric characters and the underscore character ("_").
For example, "001_title" or "00001".
NOTES:
- You are only allowed to submit batch jobs as
"Normal" priority.
- We are currently supporting 1999, 2006AA, 2009AA,
2009AB, 2010AA, and 2010AB UMLS Knowledge Sources. The usage of any year is
selectable in both interactive and batch mode via the "Knowledge Source
Options" pull-down menu.
- If you see one of your jobs developing a large
number of errors, please go ahead and suspend the job and try to figure out
what went wrong offline. This will free up the scheduling queue for other
jobs to be run.
- The tagger/parser only supports ASCII files with
blank lines separating the phrases to be parsed.
- None of the current programs available within the
Scheduler support UTF-8! The input files must be converted to ASCII before
submission.