The SPECIALIST Lexicon

Distilled Process Log - Walk Through, 2022

This page describes the filter processes from the MEDLINE n-gram set to generazte the distilled MEDLINE n-gram and candidate multiword list:

ID/ProgramIn No.Filtered No. (%)Out No.Pass RateAcc. Pass RateFilter example and notes
Generate the MEDLINE n-gram set
Generate MEDLINE n-gram set6,725,249,089
  • N=1: 38,591,469
  • N=2: 345,555,538
  • N=3: 1,189,180,839
  • N=4: 2,233,423,690
  • N=5: 2,918,497,553
6,695,158,318 (99.55%)
  • N=1: 37,342,742 (96.76%)
  • N=2: 338,036,116 (97.82%)
  • N=3: 1,178,134,206 (99.07%)
  • N=4: 2,226,281,606 (99.68%)
  • N=5: 2,915,363,648 (99.55%)
30,090,771
  • N=1: 1,248,727
  • N=2: 7,519,422
  • N=3: 11,046,633
  • N=4: 7,142,084
  • N=5: 3,133,905
0.36%N/A From MEDLINE TI & AB to the MDELINE n-gram set
  • filter out n-grams with length > 50
  • filter out n-grams with word count < 30

  • Calculated by Excel (manualy input In and Out No.)
  • Used data from n-gram set log file (not distilled); key into speadsheet for calculation.
Basic operation: Sort nGrams by DC|WC|Terms
ID-01
  • NGramFilter: SortNGramByDcWcTerm
  • Param: 1, 01
  • Run Time: 1 Min.
30,090,771030,090,771100.0000%100.0000%
  • Create link: ./05.ApplyFilters/nGram.${YEAR}
Apply General Exclusive Filters
ID-10 30,090,7712830,090,74399.9999%99.9999%
  • |
  • (|r|
  • ||
  • Ag|AgCl
  • |D|
  • |E|
  • lambda(||)
ID-11 30,090,74374030,090,00399.9975%99.9974%
  • =
  • <
  • +/-
  • >
  • -

  • -->
  • (+)
  • (%)
  • "+"
  • ((-/-))

  • ==>
  • [...]
  • *}
  • *//
ID-12
  • Filter: Digit
  • InTerm: core-term.lc
  • Param: 2, 12
  • Run time: 2 Min (norm - strip punc and space)
30,090,003185,62529,904,37899.3831%99.3806%
  • 2
  • 1
  • 3
  • 10
  • 4

  • 95%
  • 2,
  • 2000
  • 3-5
  • +/-0.5
  • (+/-0.05)
  • $1,500
  • "3 + 1"
  • 55834

  • 192.168.1.1
  • [192, 168]
  • (+15%),
ID-13
  • Filter: Number
  • InTerm: core-term.lc
  • Create link: ./inData/NRVAR
  • Param: 2, 13
  • Run time: 2 Min
29,904,3785,44429,898,93499.9818%99.3625%
  • and
  • two
  • one
  • first
  • three

  • first and second
  • one third
  • twenty-eight
  • NINE
  • zeroth and
  • Four hundred and forty-seven

  • zero-one
  • 'half'
  • One"
ID-14 29,898,934219,43529,679,49999.2661%98.6332%
  • of the
  • in the
  • to the
  • and the
  • on the

  • In the
  • and/or
  • 50% of
  • 1, 2, and
  • 2003 to
  • 2003 to 2007
  • for >=50%
  • the 8:2
  • -196 to -174

  • OR-462
  • AND-34
  • IN-1130
  • And-1
Apply Exclusive Filters - pattern
ID-20 29,679,499372,38829,307,11198.7453%97.3957%
  • tomography (CT)
  • imaging (MRI)
  • resonance imaging (MRI)
  • oxide (NO)
  • reaction (PCR)

  • chain reaction (PCR)
  • polymerase chain reaction (PCR)
  • magnetic resonance imaging (MRI)
  • computed tomography (CT)
  • enzyme-linked immunosorbent assay (ELISA)
  • single nucleotide polymorphisms (SNPs)
  • magnetic resonance (MR) imaging

  • "Standards, Options and Recommendations" (SOR)
  • (CREB)-binding protein (CBP)

  • kinase (ASK)
  • proline-rich polypeptide (PRP)
  • semi-permeable membrane devices (SPMDs)
ID-21 29,307,111528,35528,778,75698.1972%95.6398%
  • a significant
  • a single
  • a high
  • a novel
  • a case

  • a very
  • a group
  • a dose-dependent
  • A series
  • A and B
  • a meta-analysis
  • a SIF
  • A alpha C
  • A nonseminomatous

  • a delivery rate per
  • A beta 2m
  • a beta ab
ID-22 28,778,756195,84028,582,91699.3195%94.9890%
  • RESULTS:
  • METHODS:
  • CONCLUSIONS:
  • CONCLUSION:
  • BACKGROUND:

  • OBJECTIVE:
  • OBJECTIVE: To
  • MATERIALS AND METHODS:
  • SETTING:
  • PURPOSE: To
  • INTRODUCTION:
  • AIM: The
  • L: -DOPA
  • 95% PI:

  • PHPT:
  • months [95% CI:
  • vs N:
  • mode MIC:
  • [95 % CI:
ID-23 28,582,916350,58428,232,33298.7734%93.8239%
  • °C
  • ≥3
  • (mean ±
  • × 3
  • (IC 95%;
  • TGF-&beta;
ID-24 28,232,332028,232,332100.0000%93.8239%
  • (n =
  • (P <
  • (P =
  • (p <
  • P <

  • (P < 0.05)
  • 95% CI =
  • P<0.001),
  • CI},
  • US$
  • VSL#3
  • N^N
  • group (n=6) received
  • CYP3A7*1C

  • studies; average
  • n.; Trichoteleia
  • sp. n.; Trichoteleia
ID-25 28,232,332461,30227,771,03098.3661%92.2909%
  • two groups
  • 6 months
  • 24 h
  • (ABSTRACT TRUNCATED AT 250 WORDS)
  • the two groups

  • 5 years
  • at 37 degrees
  • 3 times
  • 100 mg
  • January 1,
  • 10 mg/kg
  • 12-year-old
  • at -20 degrees C
  • September 2006
  • 65 years or older with
  • 20 cigarettes per day
  • 3 - 6 months

  • 6 hours plus
  • minutes) per day, 5 days
  • MMR + V
  • 3 mg/EE
  • 317615 x
ID-26 27,771,030261,97927,509,05199.0566%91.4202%
  • group (P
  • significant (P
  • years) with
  • significantly (P
  • years) and

  • interval [95%
  • see text] The
  • lt; 0.05) lower
  • CENTRAL) (The
  • nM (SD
  • pOGH (ANG

  • cB72.3(gamma
  • new species (type
Apply Exclusive Filters - Lead-End-Terms
ID-30 27,509,0516,845,07220,663,97975.1170%68.6721%
  • of a
  • that the
  • from the
  • is a
  • of this

  • The results
  • was observed
  • this study was
  • about 50%
  • - but not
  • "what is
  • AND COURSE

  • iT reg
  • of FoxM1b
  • or spinal or conduction
  • or spinal or conduction block,
ID-31 20,663,9794,229,18116,434,79879.5336%54.6174%
  • patients with
  • associated with
  • at the
  • suggest that
  • between the

  • in patients with
  • results suggest that
  • MATERIALS AND
  • cross-reacted with
  • (ST 36) and
  • Zusanli (ST 36) and
  • determine whether this could
  • primarily composed of the

  • tilt-in-space and
  • systems, assays and
  • ppm Cu as
  • epidural or spinal or
ID-32 16,434,7983,89816,430,90099.9763%54.6044%
  • in a
  • to be
  • with a
  • as a
  • may be

  • In a
  • in A.
  • For one
  • on NO

  • anti-NOR
  • plus AT
  • I/a
  • AS-ON
  • anti-OF
ID-33 16,430,9002,219,74314,211,15786.4904%47.2276%
  • to determine
  • In addition,
  • to evaluate
  • to assess
  • to investigate

  • in the presence
  • AT 250
  • As a result,
  • ON THE TREATMENT
  • as a possible treatment for
  • in details,
  • - for example,
  • within working memory
  • for various chronic
  • in 0.1% trifluoroacetic
  • in threatened preterm labor

  • with the MIC90S
  • On PTD
  • plus LHRH-A
  • with the MIC90S of
ID-34 14,211,1572,261,43711,949,72084.0869%39.7122%
  • effects of
  • number of
  • use of
  • presence of
  • used to

  • Comparison of
  • low cost of
  • HPV) in
  • NUMBER OF
  • zymography was used to
  • loss of two or more

  • 1 goes to
  • active with the MIC90s of
  • syn. nov. of
  • microg/mmol of
The final results of above is used as the distilled MEDLINE n-gram set
Apply Exclusive Filters - Project domain
ID-40 11,949,720933,05011,016,67092.1919%36.6115%
  • of
  • the
  • in
  • to
  • a

  • The
  • We
  • "The
  • linear,
  • "normal"
  • {systematic name:
  • systematic name
  • anterior intermeniscal ligament
  • regional low-flow perfusion

  • Neo.
  • Cannon &
  • Polycentropus
  • Penneys &
  • % (month