The SPECIALIST Lexicon

Process Walk Through, 2014

This page describes the filter processes from the MEDLINE n-gram set to generazte the candidate multiword list:

ID/ProgramIn No.Filtered No. (%)Out No.Pass RateAcc. Pass RateFilter example and notes
Generate the MEDLINE n-gram set
Generate MEDLINE n-gram set3,890,891,877
  • N=1: 21,530,469
  • N=2: 205,868,398
  • N=3: 703,148,136
  • N=4: 1,295,096,308
  • N=5: 1,665,248,566
3,873,868,058 (99.56%)
  • N=1: 20,726,087 (96.26%)
  • N=2: 201,281,049 (97.77%)
  • N=3: 696,860,600 (99.11%)
  • N=4: 1,291,296,931 (99.71)
  • N=5: 1,663,703,391 (99.91%)
17,023,819
  • N=1: 804,382
  • N=2: 4,587,349
  • N=3: 6,287,536
  • N=4: 3,799,377
  • N=5: 1,545,175
0.4375%N/A From MEDLINE TI & AB to the MDELINE n-gram set
  • filter out n-grams with length > 50
  • filter out n-grams with word count < 30
Basic operation: Sort nGrams by DC|WC|Terms
ID-01
  • NGramFilter: SortNGramByDcWcTerm
  • Run Time: 2 Min.
17,023,819017,023,819100.0000%100.0000%
  • N/A
Apply General Exclusive Filters
ID-10 17,023,819617,023,813100.0000%100.0000%
  • 148|231||
  • 40|66|||
  • 38|44|(|r|
  • 33|37|Ag|AgCl
  • 27|40||D|
  • 20|31||E|
ID-11 17,023,81338617,023,42799.9977%99.9977%
  • 1259147|3690494|=
  • 712839|1551254|<
  • 604567|2377864|+/-
  • 248213|410274|>
  • 164938|263574|-

  • 11123|24882|-->
  • 6893|12640|(+)
  • 3973|4872|(%)
  • 84|119|"+"
  • 66|70|((-/-))
  • 30|70|==>
  • 23|30|[...]
  • 10|33|*}
ID-12
  • Filter: Digit
  • InTerm: core-term.lc
  • Run time: 2 Min (norm - strip punc and space)
17,023,427116,77216,906,65599.3141%99.3118%
  • 1404799|2062240|2
  • 1392936|2100673|1
  • 1121071|1557044|3
  • 888645|1142363|10
  • 861188|1136742|4

  • 239725|499064|95%
  • 179943|213064|2,
  • 85357|101143|2000
  • 12158|13244|3-5
  • 1573|1629|+/-0.5
  • 60|66|(+/-0.05)
  • 53|55|$1,500
  • 28|37|"3 + 1"
  • 3|32|55834

  • 192.168.1.1
  • [192, 168]
  • (+15%),
ID-13 16,906,6554,05616,902,59999.9760%99.2879%
  • 15357675|88271268|and
  • 2463066|3359594|two
  • 1794535|2269228|one
  • 1338090|1734879|three
  • 1322452|1637776|first

  • 18246|20674|first and second
  • 3028|3137|twenty-eight
  • 18405|19387|one third
  • 78|78|NINE
  • 33|35|zeroth and
  • 31|31|Four hundred and forty-seven
  • 24|30|zero-one
  • 22|30|'half'
ID-14 16,902,599142,06716,760,53299.1595%98.4534%
  • 10244280|24068902|of the
  • 8338601|16722249|in the
  • 4092528|5625227|to the
  • 4021188|5255407|and the
  • 3155416|4125616|on the

  • 1004206|1172415|In the
  • 421241|481834|and/or
  • 79309|82972|50% of
  • 11180|12722|1, 2, and
  • 9418|9992|2003 to
  • 451|489|2003 to 2007
  • 18|31|for >=50%
  • 15|38|the 8:2
  • 9|30|-196 to -174
  • 7|37|IN-1130
Apply Exclusive Filters - pattern
ID-20 16,760,532163,71416,596,81899.0232%97.4917%
  • 36921|37086|tomography (CT)
  • 35422|35828|oxide (NO)
  • 34554|34808|imaging (MRI)
  • 34178|34427|resonance imaging (MRI)
  • 33222|33487|reaction (PCR)

  • 33117|33381|chain reaction (PCR)
  • 30095|30315|polymerase chain reaction (PCR)
  • 28716|28882|magnetic resonance imaging (MRI)
  • 27247|27338|computed tomography (CT)
  • 16349|16641|enzyme-linked immunosorbent assay (ELISA)
  • 6823|6833|magnetic resonance (MR) imaging
  • 9101|9160|single nucleotide polymorphisms (SNPs)

  • 57|57|"Standards, Options and Recommendations" (SOR)
  • 54|54|(CREB)-binding protein (CBP)
  • 23|30|kinase (ASK)
ID-21 16,596,818303,67916,293,13998.1703%95.7079%
  • 564464|635702|a significant
  • 413574|474177|a single
  • 334224|364472|a high
  • 295612|339709|a novel
  • 270384|292590|a case

  • 122026|126365|a very
  • 116893|128316|a group
  • 79293|86669|a dose-dependent

  • 40271|40512|A series
  • 24817|32011|A and B
  • 15936|19307|a meta-analysis

  • 18|30|a SIF
  • 17|37|A alpha C
  • 17|30|A nonseminomatous
  • 6|33|a beta ab
ID-22 16,293,13992,84116,200,29899.4302%95.1625%
  • 2069343|2070116|RESULTS:
  • 1610241|1610468|METHODS:
  • 1206683|1206734|CONCLUSIONS:
  • 922219|922261|CONCLUSION:
  • 821539|821597|BACKGROUND:
  • 702402|702465|OBJECTIVE:
  • 397897|397911|OBJECTIVE: To

  • 138158|138161|MATERIALS AND METHODS:
  • 126479|126492|SETTING:
  • 114480|114481|PURPOSE: To
  • 100963|100970|INTRODUCTION:
  • 18015|18016|AIM: The

  • 16|51|L: -DOPA
  • 11|34|(95% PI:
  • 10|42|PHPT:
  • 9|45|(mode MIC:
  • 9|30|vs N:
ID-23 16,200,298113,07316,087,22599.3020%94.4983%
  • 324405|719011|(n =
  • 261122|514412|(P <
  • 172833|321303|(p <
  • 168415|354006|(P =
  • 156642|308140|P <

  • 86525|133350|(P < 0.05)
  • 12716|31896|95% CI =
  • 4526|6030|(P<0.001),
  • 393|393|{CI},
  • 243|427|(US$
  • 101|366|VSL#3
  • 23|30|N^N
  • 20|31|group (n=6) received
  • 8|30|CYP3A7*1C
  • 1|37|sp. n.; Trichoteleia
ID-24 16,087,225290,42115,796,80498.1947%92.7924%
  • 154905|181001|two groups
  • 124279|164955|6 months
  • 114464|150543|24 h
  • 106209|106209|(ABSTRACT TRUNCATED AT 250 WORDS)

  • 67193|84099|5 years
  • 42591|51200|at 37 degrees
  • 15046|19190|100 mg
  • 14981|16156|3 times
  • 12713|13701|January 1,
  • 12160|15197|10 mg/kg

  • 5869|6634|12-year-old
  • 2930|3471|at -20 degrees C
  • 1662|1685|September 2006

  • 177|184|65 years or older with
  • 176|202|20 cigarettes per day
  • 48|57|3 - 6 months
  • 7|33|6 hours plus
  • 7|33|minutes) per day, 5 days
  • 6|30|3 mg/EE
  • 4|33|317615 x
ID-25 15,796,804340,10915,456,69597.8470%90.7945%
  • 482021|1107869|(P
  • 347977|760007|(p
  • 328145|725703|(n
  • 291892|291913|[The
  • 189679|224395|years)

  • 31601|43019|group (P
  • 26291|29725|significant (P
  • 26156|33540|significantly (P
  • 25347|25992|years) with
  • 23830|24521|years) and

  • 2016|2018|interval [95%
  • 1754|1762|see text] The
  • 1149|1499|kg/m(2)),
  • 1148|1175|< 0.05) lower
  • 1028|1139|(2) A]
  • 804|804|(CENTRAL) (The

  • 7|31|nM (SD
  • 6|33|pOGH (ANG
  • 3|30|cB72.3(gamma
Apply Exclusive Filters - Lead-End-Terms
ID-30 15,456,6954,158,70211,297,99373.0945%66.3658%
  • 2780043|3451203|of a
  • 2305063|2816278|that the
  • 1958082|2444178|from the
  • 1757426|1974661|is a
  • 1675949|1842091|of this

  • 716162|752402|The results
  • 552776|620320|was observed
  • 432921|434591|this study was
  • 12539|12801|about 50%
  • 271|275|- but not
  • 142|157|"what is
  • 49|49|AND COURSE
  • 2|32|or spinal or conduction block,
ID-31 11,297,9932,384,0598,913,93478.8984%52.3615%
  • 1878109|3534031|patients with
  • 1696757|2353703|associated with
  • 1443317|1851781|at the
  • 1134560|1177671|suggest that
  • 1062545|1261445|between the

  • 742670|1055627|in patients with
  • 396327|399451|results suggest that
  • 139638|139642|MATERIALS AND
  • 3734|4063|cross-reacted with

  • 103|127|(ST 36) and
  • 59|59|determine whether this could
  • 58|69|Zusanli (ST 36) and
  • 33|33|primarily composed of the
  • 3|30|tilt-in-space and
ID-32 8,913,9342,3128,911,62299.9741%52.3480%
  • 2578756|3106139|in a
  • 2318378|2779050|to be
  • 2100327|2579735|with a
  • 1745284|2040054|as a
  • 1235421|1383886|may be

  • 223386|233138|In a
  • 6917|8934|in A.
  • 1733|1744|For one
  • 1706|1930|on NO
  • 17|36|anti-NOR
  • 10|36|AS-ON
  • 10|31|anti-OF
ID-33 8,911,6221,277,2297,634,39385.6678%44.8454%
  • 658430|708246|to determine
  • 533913|554628|In addition,
  • 413028|435484|to evaluate
  • 366579|394035|to assess
  • 357914|440750|in the presence

  • 106243|106243|AT 250
  • 35010|35283|As a result,

  • 551|551|ON THE TREATMENT
  • 215|216|as a possible treatment for
  • 140|140|in details,
  • 120|121|- for example,
  • 63|67|within working memory
  • 49|50|in 0.1% trifluoroacetic
  • 45|45|for various chronic
  • 27|33|in threatened preterm labor
  • 4|40|plus LHRH-A
  • 4|35|with the MIC90S of
ID-34 7,634,3931,283,0016,351,39283.1945%37.3089%
  • 1130970|1453306|effects of
  • 1118705|1403199|effect of
  • 1009451|1295670|number of
  • 1005978|1254873|presence of
  • 987576|1216189|use of

  • 133044|135220|Comparison of
  • 726|734|(HPV) in
  • 726|733|low cost of
  • 279|282|NUMBER OF
  • 98|99|zymography was used to
  • 43|44|loss of two or more
  • 3|37|microg/mmol of
Apply Exclusive Filters - Project domain
ID-40 6,351,392568,5925,782,80091.0478%33.9689%
  • 17804182|125085304|of
  • 15719615|119808656|the
  • 15495583|70413258|in
  • 12532576|44859301|to
  • 12292418|37885905|a

  • 10461414|22176886|The
  • 3074188|4145575|We
  • 7816|8301|"The
  • 5619|6370|"normal"
  • 5619|5870|linear,

  • 134|137|{systematic name:
  • 44|44|systematic name
  • 9|32|anterior intermeniscal ligament
  • 9|30|regional low-flow perfusion
  • 1|62|Penneys &