PreProcess: JDs (Journal Descriptors)
- Description:
Journal Descriptors are preferred Mesh terms that describe journals. Each Journal has a ID, called JID. Each JID is related to certain (one or more) JDs.
In the lisp system, 122 Journal Descriptors (JD) are in jd-abbr-table (Preferred Mesh Terms). This information is included in the List of Serials Indexed file (lsi${YEAR}.xml). This file is derived from lsi${YEAR}.xml since 2007 release.
- Input:
- ftp://ftp.nlm.nih.gov/online/journals/lsi2007.xml
- jds.txt (from previous version)
- Java File & Algorithm:
- GenerateJidTaJdsFromLsi.java
- parse lsi.xml file
- Find xml tag <NlmUniqueID> for Journal ID, JID
- Find xml tag <MedlineTA> for Journal Title, TA
- Find xml tag <BroadJournalHeading> for Journal Descriptors, JDs
- Find xml tag <BroadJournalHeadingList> for the beginning of JDs
- print out information in the new format to file: jidTaJds.out
- print out information in the new format to file: jds.txt
- Output File:
jds.txt, used in TC.JDI and TC.STRI
- Notes:
- Journal descriptors changed every year.
- The file is sorted by the order of JD ID (version, then alphabetically)
- Status: Active, Inactive
- There are difference in JDs between versions:
- Susanne's file (used in 2004 training set) & lsi2006.xml:
jd-abbr-table | lsi2006.xml | Notes
|
---|
Anthropology, Physical | Anthropology |
|
Antibiotics | Anti-Bacterial Agents |
|
Behavior | Behavioral Sciences |
|
Delivery of Health Care | Health Services |
|
Family Planning | Family Planning Services |
|
Genetics, Behavioral | - Behavioral Sciences
- Genetics
|
|
| Library Science |
|
| Research | Not a valid JD, should be removed
|
| Tuberculosis | Not a valid JD, should be removed
|
- lsi2006.xml & lsi2007.xml:
lsi2006.xml | lsi2007.xml
|
---|
Nutrition | Nutritional Sciences
|
- Different JDs will generate different JDI training set and results. We use the similarity on those common JDs to compare results.