The SPECIALIST Lexicon

Process Walk Through, 2015

This page describes the filter processes from the MEDLINE n-gram set to generazte the candidate multiword list:

ID/ProgramIn No.Filtered No. (%)Out No.Pass RateAcc. Pass RateFilter example and notes
Generate the MEDLINE n-gram set
Generate MEDLINE n-gram set4,133,736,858
  • N=1: 22,779,973
  • N=2: 217,447,811
  • N=3: 744,721,406
  • N=4: 1,375,850,664
  • N=5: 1,772,937,004
4,115,588,166 (99.56%)
  • N=1: 21,936,767 (96.30%)
  • N=2: 212,601,846 (97.77%)
  • N=3: 738,019,212 (99.10%)
  • N=4: 1,371,768,052 (99.70)
  • N=5: 1,771,262,289 (99.91%)
18,148,692
  • N=1: 843,206
  • N=2: 4,845,965
  • N=3: 6,702,194
  • N=4: 4,082,612
  • N=5: 1,674,715
0.4390%N/A From MEDLINE TI & AB to the MDELINE n-gram set
  • filter out n-grams with length > 50
  • filter out n-grams with word count < 30
Basic operation: Sort nGrams by DC|WC|Terms
ID-01
  • NGramFilter: SortNGramByDcWcTerm
  • Run Time: 1 Min.
18,148,692018,148,692100.0000%100.0000%
  • N/A
Apply General Exclusive Filters
ID-10 18,148,692618,148,686100.0000%100.0000%
  • 177|287||
  • 47|61|(|r|
  • 44|75|||
  • 37|41|Ag|AgCl
  • 29|42||D|
  • 22|33||E|
ID-11 18,148,68640318,148,28399.9978%99.9977%
  • 1359637|4019269|=
  • 770031|1673904|<
  • 640968|2508566|+/-
  • 261743|43214|>
  • 185166|294773|-

  • 11237|25176|-->
  • 7589|14028|(+)
  • 4319|5303|(%)
  • 92|129|"+"
  • 69|76|((-/-))

  • 30|70|==>
  • 25|32|[...]
  • 10|33|*}
  • 7|35|*//
ID-12
  • Filter: Digit
  • InTerm: core-term.lc
  • Run time: 2 Min (norm - strip punc and space)
18,148,283124,66718,023,61699.3131%99.3108%
  • 1487302|2186880|2
  • 1473237|2221550|1
  • 1184011|1645974|3
  • 933377|1199189|10
  • 906030|1196189|4

  • 266867|564771|95%
  • 191025|226218|2,
  • 90921|107808|2000
  • 12903|14077|3-5
  • 1656|1713|+/-0.5
  • 64|70|(+/-0.05)
  • 54|56|$1,500
  • 29|38|"3 + 1"
  • 3|32|55834

  • 192.168.1.1
  • [192, 168]
  • (+15%),
ID-13 18,023,6164,17618,019,44099.9768%99.2878%
  • 6215701|94663348|and
  • 2606764|3557102|two
  • 1900947|2401439|one
  • 1417581|1838738|three
  • 1413262|1750278|first

  • 19411|22010|first and second
  • 19340|20365|one third
  • 3212|3332|twenty-eight
  • 81|81|NINE
  • 34|36|zeroth and
  • 34|34|Four hundred and forty-seven

  • 24|30|zero-one
  • 23|31|'half'
  • 21|32|One"
ID-14 18,019,440149,75917,869,68199.1689%98.4626%
  • 10788709|25297882|of the
  • 8815978|17681431|in the
  • 4341383|5963167|to the
  • 4275276|5595157|and the
  • 3348659|4382031|on the

  • 1065081|124224|In the
  • 449304|514376|and/or
  • 82801|86589|50% of
  • 11903|13569|1, 2, and
  • 10424|1112|2003 to
  • 483|52|2003 to 2007
  • 22|35|for >=50%
  • 17|41|the 8:2
  • 12|41|-196 to -174

  • 12|32|OR-462
  • 9|45|AND-34
  • 8|44|IN-1130
Apply Exclusive Filters - pattern
ID-20 17,869,681179,45517,690,22698.9958%97.4738%
  • 40159|40347|tomography (CT)
  • 37934|38215|imaging (MRI)
  • 37536|37811|resonance imaging (MRI)
  • 37136|37570|oxide (NO)
  • 34892|35164|reaction (PCR)

  • 34779|35050|chain reaction (PCR)
  • 31588|31815|polymerase chain reaction (PCR)
  • 31559|31740|magnetic resonance imaging (MRI)
  • 29756|29859|computed tomography (CT)
  • 22000|22423|enzyme-linked immunosorbent assay (ELISA)
  • 10430|10495|single nucleotide polymorphisms (SNPs)
  • 7170|7181|magnetic resonance (MR) imaging

  • 57|57|"Standards, Options and Recommendations" (SOR)
  • 54|54|(CREB)-binding protein (CBP)

  • 24|30|proline-rich polypeptide (PRP)
  • 24|30|semi-permeable membrane devices (SPMDs)
  • 23|30|kinase (ASK)
ID-21 17,690,226322,63417,367,59298.1762%95.6961%
  • 604583|680911|a significant
  • 438065|502248|a single
  • 356461|388743|a high
  • 325369|373547|a novel
  • 289669|313959|a case

  • 129697|134307|a very
  • 122796|134755|a group
  • 82457|90092|a dose-dependent
  • 42749|43010|A series
  • 26150|33730|A and B
  • 19312|23481|a meta-analysis
  • 18|30|a SIF
  • 17|37|A alpha C
  • 17|30|A nonseminomatous

  • 8|40|a delivery rate per
  • 8|30|A beta 2m
  • 6|33|a beta ab
ID-22 17,367,592103,51317,264,07999.4040%95.1257%
  • 2308559|2309480|RESULTS:
  • 1808941|1809233|METHODS:
  • 1344616|1344681|CONCLUSIONS:
  • 1031625|1031682|CONCLUSION:
  • 927762|927831|BACKGROUND:

  • 771706|771790|OBJECTIVE:
  • 434190|434210|OBJECTIVE: To
  • 160972|160978|MATERIALS AND METHODS:
  • 135306|135323|SETTING:
  • 125448|125449|PURPOSE: To
  • 117655|117663|INTRODUCTION:
  • 21224|21225|AIM: The
  • 16|51|L: -DOPA
  • 13|39|(95% PI:

  • 12|44|PHPT:
  • 12|33|months [95% CI:
  • 9|45|(mode MIC:
  • 9|30|vs N:
ID-23 17,264,079123,96117,140,11899.2820%94.4427%
  • 350284|773827|(n =
  • 282236|555463|(P <
  • 186324|345583|(p <
  • 185061|390634|(P =
  • 171218|336564|P <

  • 93817|145119|(P < 0.05)
  • 14434|36516|95% CI =
  • 5062|6780|(P<0.001),
  • 434|434|{CI},
  • 263|465|(US$
  • 116|425|VSL#3
  • 29|36|N^N
  • 22|33|group (n=6) received
  • 9|35|CYP3A7*1C

  • 4|31|studies; average
  • 1|37|n.; Trichoteleia
  • 1|37|sp. n.; Trichoteleia
ID-24 17,140,118306,11816,834,00098.2140%92.7560%
  • 166003|194923|two groups
  • 133239|177412|6 months
  • 120001|157789|24 h
  • 106209|106209|(ABSTRACT TRUNCATED AT 250 WORDS)
  • 90083|103829|the two groups

  • 71720|90022|5 years
  • 43788|52607|at 37 degrees
  • 15832|1707|3 times
  • 15652|19993|100 mg
  • 14079|15140|January 1,
  • 12628|15821|10 mg/kg
  • 6268|7121|12-year-old
  • 3055|3618|at -20 degrees C
  • 1773|1798|September 2006
  • 198|20|65 years or older with
  • 187|213|20 cigarettes per day
  • 53|62|3 - 6 months

  • 7|33|6 hours plus
  • 7|33|minutes) per day, 5 days
  • 6|30|3 mg/EE
  • 4|33|317615 x
ID-25 16,834,000360,62916,473,37197.8577%90.7689%
  • 517241|1194153|(P
  • 372493|817448|(p
  • 354413|781204|(n
  • 296584|296606|[The
  • 204833|242169|years)

  • 34948|47940|group (P
  • 27815|31481|significant (P
  • 27236|34925|significantly (P
  • 27053|2773|years) with
  • 25860|26611|years) and
  • 2268|2270|interval [95%
  • 1775|1783|see text] The
  • 1291|1692|kg/m(2)),
  • 1200|1318|< 0.05) lower
  • 1110|1233|(2) A]
  • 915|915|(CENTRAL) (The
  • 7|31|nM (SD
  • 6|33|pOGH (ANG

  • 3|30|cB72.3(gamma
  • 3|30|new species (type
  • 2|30|(type locality: Chiapas,
  • 1|30|% (month
Apply Exclusive Filters - Lead-End-Terms
ID-30 16,473,3714,417,63912,055,73273.1832%66.4276%
  • 2934993|3645875|of a
  • 2438055|2977984|that the
  • 2075842|2589457|from the
  • 1905114|2142832|is a
  • 1802394|1980469|of this

  • 759962|798777|The results
  • 585346|656927|was observed
  • 477131|47900|this study was
  • 12894|1315|about 50%
  • 310|314|- but not
  • 163|18|"what is
  • 49|49|AND COURSE

  • 5|31|iT reg
  • 2|34|or spinal or conduction
  • 2|32|or spinal or conduction block,
ID-31 12,055,7322,537,5189,518,21478.9518%52.4457%
  • 1995649|3766489|patients with
  • 1836959|2567099|associated with
  • 1529123|1961949|at the
  • 1195832|1241361|suggest that
  • 1131685|1346050|between the

  • 793679|1131667|in patients with
  • 416751|420043|results suggest that
  • 162678|162685|MATERIALS AND
  • 3790|4126|cross-reacted with
  • 128|154|(ST 36) and
  • 72|84|Zusanli (ST 36) and
  • 61|61|determine whether this could
  • 34|34|primarily composed of the

  • 4|40|tilt-in-space and
  • 4|38|ppm Cu as
  • 3|35|epidural or spinal or
ID-32 9,518,2142,4209,515,79499.9746%52.4324%
  • 2738055|3301906|in a
  • 2454141|2940752|to be
  • 2237967|2750972|with a
  • 1870198|2189325|as a
  • 1307265|1463833|may be

  • 235585|245831|In a
  • 7592|9827|in A.
  • 1809|1823|For one
  • 1784|2017|on NO

  • 17|36|anti-NOR
  • 17|36|plus AT
  • 12|39|I/a
  • 10|36|AS-ON
  • 10|31|anti-OF
ID-33 9,515,7941,358,1838,157,61185.7271%44.9488%
  • 705902|759885|to determine
  • 574073|596464|In addition,
  • 453786|478898|to evaluate
  • 402617|433415|to assess
  • 378687|388308|to investigate

  • 372398|457884|in the presence
  • 106243|106243|AT 250
  • 38300|38602|As a result,
  • 552|552|ON THE TREATMENT
  • 233|234|as a possible treatment for
  • 152|153|in details,
  • 130|131|- for example,
  • 68|75|within working memory
  • 50|51|for various chronic
  • 49|50|in 0.1% trifluoroacetic
  • 28|34|in threatened preterm labor

  • 5|38|with the MIC90S
  • 4|40|plus LHRH-A
  • 4|35|with the MIC90S of
ID-34 8,157,6111,364,0506,793,56183.2788%37.4328%
  • 1198032|1540017|effects of
  • 1180790|1481392|effect of
  • 1075320|1380676|number of
  • 1057722|1318606|presence of
  • 1052297|1298995|use of

  • 139743|142035|Comparison of
  • 793|800|low cost of
  • 789|797|(HPV) in
  • 289|292|NUMBER OF
  • 107|108|zymography was used to
  • 44|45|loss of two or more

  • 6|33|1 goes to
  • 5|35|active with the MIC90s of
  • 3|37|microg/mmol of
Apply Exclusive Filters - Project domain
ID-40 6,793,561598,7706,194,79191.1862%34.1335%
  • 18688515|132624142|of
  • 16578464|127210139|the
  • 16331010|74804668|in
  • 13320037|47977695|to
  • 13050536|40419605|a

  • 11077342|23542516|The
  • 3317802|4494756|We
  • 8219|873|"The
  • 5951|6211|linear,
  • 5833|6602|"normal"
  • 158|163|{systematic name:
  • 52|52|systematic name
  • 10|33|anterior intermeniscal ligament
  • 9|30|regional low-flow perfusion

  • 2|32|Neo.
  • 2|31|Cannon &
  • 1|62|Penneys &