Word Tokenizer Rules (requirements)
Word Tokenizer is used to tokenize and filter out words and characters in TI and AB fields from citations. The following rules/requirements are captured from the training set of 2004 data.
| Pattern | Case | Action | Examples | Exceptions | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ..."CopyrightCopyright.." | Yes | remove "CopyrightCopyright.." | 0284-10559555 | None | ||||||||||||
| ..."Copyright Copyright.." | Yes | remove "Copyright Copyright.." | 0285-10567770 | None | ||||||||||||
| ..."Copyright Crown copyright..." | Yes | remove "Copyright Crown copyright..." | 0294-10878556 | None | ||||||||||||
| ..." Copyright.." | Yes | remove " Copyright.." |
|
| ||||||||||||
| ...".Copyright.." | Yes | remove ".Copyright.." | None | |||||||||||||
| ...")Copyright.." | Yes | remove ")Copyright.." | None | |||||||||||||
| ..."? Copyright.." | Yes | remove "? Copyright.." | None | |||||||||||||
| ...") Copyright.." | Yes | remove ") Copyright.." | None | |||||||||||||
| ..."(Japanese Association of Intellectual Copyright #130,591)"... | Yes | remove "(Japanese Association of Intellectual Copyright #130,591)" | 0306-11276498 | None | ||||||||||||
| ..."Copyright 2001 Wiley-Liss, Inc." | Yes | remove "Copyright 2001 Wiley-Liss, Inc." | 0310-11391771 | None | ||||||||||||
| ..."Copyright -Copyright 2000 John Wiley & Sons, Ltd." | Yes | remove "Copyright -Copyright 2000 John Wiley & Sons, Ltd." | 0301-11114061 | None | ||||||||||||
| ..."GRASPCopyright workload system"... | Yes | Do nothing | 0299-11049704 | None | ||||||||||||
| ..."PortfolioCopyright, a tool"... | Yes | Do nothing | 0298-10995616 | None | ||||||||||||
| ..."(Copyright P<0.001)"... | Yes | Do nothing | 0411-10373290 | None |
| Pattern | Case | Action | Examples | Exceptions | ||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ..."(abstract.." | Yes | remove "(abstract.." | 0314-11530280 |
| ||||||||||||||
| ..."(ABSTRACT.." | Yes | remove "(ABSTRACT.." |
| None | ||||||||||||||
| ..."(abstract trucated..)" | No | remove "(abstract trucated..)" | None | |||||||||||||||
| ..."(abstracts were not included)"... | No | remove "(abstracts were not included)" | 0289-10695616 | None | ||||||||||||||
| ..."(abstracts presented at recent scientific meetings, manufacturers' package inserts)"... | Yes | remove "(abstracts presented at recent scientific meetings, manufacturers' package inserts)" | 0306-11261533 | None |
| Pattern | Case | Action | Example | Exceptions |
|---|---|---|---|---|
| ..."[see comments]" | No | remove "[see comments]" | None | |
| ..."(see comments)" | No | remove "(see comments)" | None | |
| ..."[seecomments]" | No | remove "[seecomments]" | None | |
| ..."[ see comments]" | No | remove "[ see comments]" | None | |
| ..."(comments..)]" | No | remove "(comments..)]" | None |
| Pattern | Case | Action | Examples | Exceptions | ||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ..."[correction..]"... | No | remove "[correction..]" |
| None | ||||||||||||||||||||||||||||||||||||
| ..."[Key words:"... | No | remove "[key words:" | 0290-10751293 | None | ||||||||||||||||||||||||||||||||||||
| ..."[..]" | No | remove "[..]" |
|
| ||||||||||||||||||||||||||||||||||||
| ..."[.. .]" | No | remove "[.. .]" |
| |||||||||||||||||||||||||||||||||||||
| "["..."]" | No | remove "[" and "]" | None | |||||||||||||||||||||||||||||||||||||
| ..."[published erratum..]" | No | remove "[published erratum..]" | None | |||||||||||||||||||||||||||||||||||||
| ..."[forensic science international..]" | No | remove "[forensic science international..]" | None | |||||||||||||||||||||||||||||||||||||
| ..."(published erratum..)" | No | remove "(published erratum..)" | None | |||||||||||||||||||||||||||||||||||||
| ..."[in process citation]" | No | remove "[in process citation]" | None | |||||||||||||||||||||||||||||||||||||
| ..."(in process citation)" | No | remove "(in process citation)" | None | |||||||||||||||||||||||||||||||||||||
| ..."[corrrected]" | No | remove "[corrrected]" | None | |||||||||||||||||||||||||||||||||||||
| ..."[correction of artistic]" | No | remove "[correction of artistic]" | None | |||||||||||||||||||||||||||||||||||||
| ..."(letter)" | No | remove "(letter)" | None | |||||||||||||||||||||||||||||||||||||
| ..."(letter)]" | No | remove "(letter)]" | None | |||||||||||||||||||||||||||||||||||||
| ..."(editorial)]" | No | remove "(editorial)]" | None | |||||||||||||||||||||||||||||||||||||
| ..."2-[substituted acetyl]-amino-5-alkyl-1,3,4-thiadiazoles" | No | do nothing | 0404-9868551 | None |
| Pattern | Case | Action | Examples |
|---|---|---|---|
| ..."[J. Neuroimmunol. 104, 85-91]"... | Yes | remove "[J. Neuroimmunol. 104, 85-91]" | 0295-10900360 |
| Pattern | Case | Action | Examples | ||||
|---|---|---|---|---|---|---|---|
| ..."didn't"... | No | expand to "did not" | 0404-9875250 | ||||
| ..."don't"... | No | expand to "do not" |
| ||||
| ..."who'd"... | No | expand to "who would" | 0311-11425141 | ||||
| ..."can't"... | No | expand to "cannot" | 0316-11673724 | ||||
| ..."won't"... | No | expand to "will not" | 0314-11519969 | ||||
| ..."wouldn't"... | No | expand to "would not" | 0405-9892548 |