Because of a lapse in government funding, the information on this website may not be up to date, transactions submitted via the website may not be processed, and the agency may not be able to respond to inquiries until appropriations are enacted. The NIH Clinical Center (the research hospital of NIH) is open. For more details about its operating status, please visit cc.nih.gov. Updates regarding government operating status and resumption of normal operations can be found at OPM.gov.
Word Tokenizer Rules (requirements)
Word Tokenizer is used to tokenize and filter out words and characters in TI and AB fields from citations. The following rules/requirements are captured from the training set of 2004 data.
Pattern | Case | Action | Examples | Exceptions | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
..."CopyrightCopyright.." | Yes | remove "CopyrightCopyright.." | 0284-10559555 | None | ||||||||||||
..."Copyright Copyright.." | Yes | remove "Copyright Copyright.." | 0285-10567770 | None | ||||||||||||
..."Copyright Crown copyright..." | Yes | remove "Copyright Crown copyright..." | 0294-10878556 | None | ||||||||||||
..." Copyright.." | Yes | remove " Copyright.." |
|
| ||||||||||||
...".Copyright.." | Yes | remove ".Copyright.." | None | |||||||||||||
...")Copyright.." | Yes | remove ")Copyright.." | None | |||||||||||||
..."? Copyright.." | Yes | remove "? Copyright.." | None | |||||||||||||
...") Copyright.." | Yes | remove ") Copyright.." | None | |||||||||||||
..."(Japanese Association of Intellectual Copyright #130,591)"... | Yes | remove "(Japanese Association of Intellectual Copyright #130,591)" | 0306-11276498 | None | ||||||||||||
..."Copyright 2001 Wiley-Liss, Inc." | Yes | remove "Copyright 2001 Wiley-Liss, Inc." | 0310-11391771 | None | ||||||||||||
..."Copyright -Copyright 2000 John Wiley & Sons, Ltd." | Yes | remove "Copyright -Copyright 2000 John Wiley & Sons, Ltd." | 0301-11114061 | None | ||||||||||||
..."GRASPCopyright workload system"... | Yes | Do nothing | 0299-11049704 | None | ||||||||||||
..."PortfolioCopyright, a tool"... | Yes | Do nothing | 0298-10995616 | None | ||||||||||||
..."(Copyright P<0.001)"... | Yes | Do nothing | 0411-10373290 | None |
Pattern | Case | Action | Examples | Exceptions | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
..."(abstract.." | Yes | remove "(abstract.." | 0314-11530280 |
| ||||||||||||||
..."(ABSTRACT.." | Yes | remove "(ABSTRACT.." |
| None | ||||||||||||||
..."(abstract trucated..)" | No | remove "(abstract trucated..)" | None | |||||||||||||||
..."(abstracts were not included)"... | No | remove "(abstracts were not included)" | 0289-10695616 | None | ||||||||||||||
..."(abstracts presented at recent scientific meetings, manufacturers' package inserts)"... | Yes | remove "(abstracts presented at recent scientific meetings, manufacturers' package inserts)" | 0306-11261533 | None |
Pattern | Case | Action | Example | Exceptions |
---|---|---|---|---|
..."[see comments]" | No | remove "[see comments]" | None | |
..."(see comments)" | No | remove "(see comments)" | None | |
..."[seecomments]" | No | remove "[seecomments]" | None | |
..."[ see comments]" | No | remove "[ see comments]" | None | |
..."(comments..)]" | No | remove "(comments..)]" | None |
Pattern | Case | Action | Examples | Exceptions | ||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
..."[correction..]"... | No | remove "[correction..]" |
| None | ||||||||||||||||||||||||||||||||||||
..."[Key words:"... | No | remove "[key words:" | 0290-10751293 | None | ||||||||||||||||||||||||||||||||||||
..."[..]" | No | remove "[..]" |
|
| ||||||||||||||||||||||||||||||||||||
..."[.. .]" | No | remove "[.. .]" |
| |||||||||||||||||||||||||||||||||||||
"["..."]" | No | remove "[" and "]" | None | |||||||||||||||||||||||||||||||||||||
..."[published erratum..]" | No | remove "[published erratum..]" | None | |||||||||||||||||||||||||||||||||||||
..."[forensic science international..]" | No | remove "[forensic science international..]" | None | |||||||||||||||||||||||||||||||||||||
..."(published erratum..)" | No | remove "(published erratum..)" | None | |||||||||||||||||||||||||||||||||||||
..."[in process citation]" | No | remove "[in process citation]" | None | |||||||||||||||||||||||||||||||||||||
..."(in process citation)" | No | remove "(in process citation)" | None | |||||||||||||||||||||||||||||||||||||
..."[corrrected]" | No | remove "[corrrected]" | None | |||||||||||||||||||||||||||||||||||||
..."[correction of artistic]" | No | remove "[correction of artistic]" | None | |||||||||||||||||||||||||||||||||||||
..."(letter)" | No | remove "(letter)" | None | |||||||||||||||||||||||||||||||||||||
..."(letter)]" | No | remove "(letter)]" | None | |||||||||||||||||||||||||||||||||||||
..."(editorial)]" | No | remove "(editorial)]" | None | |||||||||||||||||||||||||||||||||||||
..."2-[substituted acetyl]-amino-5-alkyl-1,3,4-thiadiazoles" | No | do nothing | 0404-9868551 | None |
Pattern | Case | Action | Examples |
---|---|---|---|
..."[J. Neuroimmunol. 104, 85-91]"... | Yes | remove "[J. Neuroimmunol. 104, 85-91]" | 0295-10900360 |
Pattern | Case | Action | Examples | ||||
---|---|---|---|---|---|---|---|
..."didn't"... | No | expand to "did not" | 0404-9875250 | ||||
..."don't"... | No | expand to "do not" |
| ||||
..."who'd"... | No | expand to "who would" | 0311-11425141 | ||||
..."can't"... | No | expand to "cannot" | 0316-11673724 | ||||
..."won't"... | No | expand to "will not" | 0314-11519969 | ||||
..."wouldn't"... | No | expand to "would not" | 0405-9892548 |