The SPECIALIST Lexicon

Distilled Process Log - Walk Through, 2020

This page describes the filter processes from the MEDLINE n-gram set to generazte the distilled MEDLINE n-gram and candidate multiword list:

ID/ProgramIn No.Filtered No. (%)Out No.Pass RateAcc. Pass RateFilter example and notes
Generate the MEDLINE n-gram set
Generate MEDLINE n-gram set5,917,545,090
  • N=1: 33,884,283
  • N=2: 306,580,281
  • N=3: 1,052,247,625
  • N=4: 1,966,182,302
  • N=5: 2,558,650,539
5,891,234,222 (99.56%)
  • N=1: 32,757,517 (96.67%)
  • N=2: 299,877,583 (97.81%)
  • N=3: 1,042,569,925 (99.08%)
  • N=4: 1,960,027,982 (99.69%)
  • N=5: 2,556,001,215 (99.90%)
26,310,808
  • N=1: 1,126,766
  • N=2: 6,702,698
  • N=3: 9,677,700
  • N=4: 6,154,320
  • N=5: 2,649,324
0.44%N/A From MEDLINE TI & AB to the MDELINE n-gram set
  • filter out n-grams with length > 50
  • filter out n-grams with word count < 30

  • Calculated by Excel (manualy input In and Out No.)
  • Used data from n-gram set log file (not distilled)
Basic operation: Sort nGrams by DC|WC|Terms
ID-01
  • NGramFilter: SortNGramByDcWcTerm
  • Param: 1, 01
  • Run Time: 1 Min.
26,310,808026,310,808100.0000%100.0000%
  • Create link: ./05.ApplyFilters/nGram.${YEAR}
Apply General Exclusive Filters
ID-10 26,310,8081926,310,78999.9999%99.9999%
  • |
  • (|r|
  • ||
  • Ag|AgCl
  • |D|
  • |E|
  • lambda(||)
ID-11 26,310,78966226,310,12799.9975%99.9974%
  • =
  • <
  • +/-
  • >
  • -

  • -->
  • (+)
  • (%)
  • "+"
  • ((-/-))

  • ==>
  • [...]
  • *}
  • *//
ID-12
  • Filter: Digit
  • InTerm: core-term.lc
  • Param: 2, 12
  • Run time: 2 Min (norm - strip punc and space)
26,310,127163,21526,146,91299.3796%99.3771%
  • 2
  • 1
  • 3
  • 10
  • 4

  • 95%
  • 2,
  • 2000
  • 3-5
  • +/-0.5
  • (+/-0.05)
  • $1,500
  • "3 + 1"
  • 55834

  • 192.168.1.1
  • [192, 168]
  • (+15%),
ID-13
  • Filter: Number
  • InTerm: core-term.lc
  • Create link: ./inData/NRVAR
  • Param: 2, 13
  • Run time: 2 Min
26,146,9125,08726,141,82599.9805%99.3577%
  • and
  • two
  • one
  • first
  • three

  • first and second
  • one third
  • twenty-eight
  • NINE
  • zeroth and
  • Four hundred and forty-seven

  • zero-one
  • 'half'
  • One"
ID-14 26,141,825197,43525,944,39099.2448%98.6073%
  • of the
  • in the
  • to the
  • and the
  • on the

  • In the
  • and/or
  • 50% of
  • 1, 2, and
  • 2003 to
  • 2003 to 2007
  • for >=50%
  • the 8:2
  • -196 to -174

  • OR-462
  • AND-34
  • IN-1130
  • And-1
Apply Exclusive Filters - pattern
ID-20 25,944,390307,96725,636,42398.8130%97.4369%
  • tomography (CT)
  • imaging (MRI)
  • resonance imaging (MRI)
  • oxide (NO)
  • reaction (PCR)

  • chain reaction (PCR)
  • polymerase chain reaction (PCR)
  • magnetic resonance imaging (MRI)
  • computed tomography (CT)
  • enzyme-linked immunosorbent assay (ELISA)
  • single nucleotide polymorphisms (SNPs)
  • magnetic resonance (MR) imaging

  • "Standards, Options and Recommendations" (SOR)
  • (CREB)-binding protein (CBP)

  • kinase (ASK)
  • proline-rich polypeptide (PRP)
  • semi-permeable membrane devices (SPMDs)
ID-21 25,636,423464,71325,171,71098.1873%95.6706%
  • a significant
  • a single
  • a high
  • a novel
  • a case

  • a very
  • a group
  • a dose-dependent
  • A series
  • A and B
  • a meta-analysis
  • a SIF
  • A alpha C
  • A nonseminomatous

  • a delivery rate per
  • A beta 2m
  • a beta ab
ID-22 25,171,710168,78625,002,92499.3295%95.0291%
  • RESULTS:
  • METHODS:
  • CONCLUSIONS:
  • CONCLUSION:
  • BACKGROUND:

  • OBJECTIVE:
  • OBJECTIVE: To
  • MATERIALS AND METHODS:
  • SETTING:
  • PURPOSE: To
  • INTRODUCTION:
  • AIM: The
  • L: -DOPA
  • 95% PI:

  • PHPT:
  • months [95% CI:
  • vs N:
  • mode MIC:
  • [95 % CI:
ID-23 25,002,924190,75724,812,16799.2371%94.3041%
  • (n =
  • (P <
  • (P =
  • (p <
  • P <

  • (P < 0.05)
  • 95% CI =
  • P<0.001),
  • CI},
  • US$
  • VSL#3
  • N^N
  • group (n=6) received
  • CYP3A7*1C

  • studies; average
  • n.; Trichoteleia
  • sp. n.; Trichoteleia
ID-24 24,812,167419,59124,392,57698.3089%92.7093%
  • two groups
  • 6 months
  • 24 h
  • (ABSTRACT TRUNCATED AT 250 WORDS)
  • the two groups

  • 5 years
  • at 37 degrees
  • 3 times
  • 100 mg
  • January 1,
  • 10 mg/kg
  • 12-year-old
  • at -20 degrees C
  • September 2006
  • 65 years or older with
  • 20 cigarettes per day
  • 3 - 6 months

  • 6 hours plus
  • minutes) per day, 5 days
  • MMR + V
  • 3 mg/EE
  • 317615 x
ID-25 24,392,576231,52124,161,05599.0509%91.8294%
  • group (P
  • significant (P
  • years) with
  • significantly (P
  • years) and

  • interval [95%
  • see text] The
  • lt; 0.05) lower
  • CENTRAL) (The
  • nM (SD
  • pOGH (ANG

  • cB72.3(gamma
  • new species (type
ID-26 24,161,05589,15224,071,90399.6310%91.4906%
  • °C
  • ≥3
  • (mean ±
  • × 3
  • (IC 95%;
  • TGF-&beta;
Apply Exclusive Filters - Lead-End-Terms
ID-30 24,071,9036,082,28517,989,61874.7328%68.3735%
  • of a
  • that the
  • from the
  • is a
  • of this

  • The results
  • was observed
  • this study was
  • about 50%
  • - but not
  • "what is
  • AND COURSE

  • iT reg
  • of FoxM1b
  • or spinal or conduction
  • or spinal or conduction block,
ID-31 17,989,6183,697,29514,292,32379.4476%54.3211%
  • patients with
  • associated with
  • at the
  • suggest that
  • between the

  • in patients with
  • results suggest that
  • MATERIALS AND
  • cross-reacted with
  • (ST 36) and
  • Zusanli (ST 36) and
  • determine whether this could
  • primarily composed of the

  • tilt-in-space and
  • systems, assays and
  • ppm Cu as
  • epidural or spinal or
ID-32 14,292,3233,56014,288,76399.9751%54.3076%
  • in a
  • to be
  • with a
  • as a
  • may be

  • In a
  • in A.
  • For one
  • on NO

  • anti-NOR
  • plus AT
  • I/a
  • AS-ON
  • anti-OF
ID-33 14,288,7631,952,24312,336,52086.3372%46.8877%
  • to determine
  • In addition,
  • to evaluate
  • to assess
  • to investigate

  • in the presence
  • AT 250
  • As a result,
  • ON THE TREATMENT
  • as a possible treatment for
  • in details,
  • - for example,
  • within working memory
  • for various chronic
  • in 0.1% trifluoroacetic
  • in threatened preterm labor

  • with the MIC90S
  • On PTD
  • plus LHRH-A
  • with the MIC90S of
ID-34 12,336,5201,982,49910,354,02183.9298%39.3527%
  • effects of
  • number of
  • use of
  • presence of
  • used to

  • Comparison of
  • low cost of
  • HPV) in
  • NUMBER OF
  • zymography was used to
  • loss of two or more

  • 1 goes to
  • active with the MIC90s of
  • syn. nov. of
  • microg/mmol of
The final results of above is used as the distilled MEDLINE n-gram set
Apply Exclusive Filters - Project domain
ID-40 10,354,021859,0609,494,96191.7031%36.0877%
  • of
  • the
  • in
  • to
  • a

  • The
  • We
  • "The
  • linear,
  • "normal"
  • {systematic name:
  • systematic name
  • anterior intermeniscal ligament
  • regional low-flow perfusion

  • Neo.
  • Cannon &
  • Polycentropus
  • Penneys &
  • % (month