The SPECIALIST Lexicon

Distilled Process Log - Walk Through, 2019

This page describes the filter processes from the MEDLINE n-gram set to generazte the distilled MEDLINE n-gram and candidate multiword list:

ID/ProgramIn No.Filtered No. (%)Out No.Pass RateAcc. Pass RateFilter example and notes
Generate the MEDLINE n-gram set
Generate MEDLINE n-gram set5,567,578,926
  • N=1: 32,469,239
  • N=2: 290,801,313
  • N=3: 993,227,476
  • N=4: 1,849,662,679
  • N=5: 2,401,418,219
5,542,912,110 (99.56%)
  • N=1: 31,394,012 (96.69%)
  • N=2: 284,464,615 (97.82%)
  • N=3: 984,148,940 (99.09%)
  • N=4: 1,843,933,089 (99.69%)
  • N=5: 2,398,971,454 (99.90%)
24,666,816
  • N=1: 1,075,227
  • N=2: 6,336,698
  • N=3: 9,078,536
  • N=4: 5,729,590
  • N=5: 2,446,765
0.4430%N/A From MEDLINE TI & AB to the MDELINE n-gram set
  • filter out n-grams with length > 50
  • filter out n-grams with word count < 30

  • Calculated by Excel (manualy input In and Out No.)
  • Used data from n-gram set log file (not distilled)
Basic operation: Sort nGrams by DC|WC|Terms
ID-01
  • NGramFilter: SortNGramByDcWcTerm
  • Param: 1, 01
  • Run Time: 1 Min.
24,666,816024,666,816100.0000%100.0000%
  • Create link: ./05.ApplyFilters/nGram.${YEAR}
Apply General Exclusive Filters
ID-10 24,666,8161324,666,80399.9999%99.9999%
  • |
  • (|r|
  • ||
  • Ag|AgCl
  • |D|
  • |E|
  • lambda(||)
ID-11 24,666,80359724,666,20699.9976%99.9975%
  • =
  • <
  • +/-
  • >
  • -

  • -->
  • (+)
  • (%)
  • "+"
  • ((-/-))

  • ==>
  • [...]
  • *}
  • *//
ID-12
  • Filter: Digit
  • InTerm: core-term.lc
  • Param: 2, 12
  • Run time: 2 Min (norm - strip punc and space)
24,666,206150,80624,515,40099.3886%99.3872%
  • 2
  • 1
  • 3
  • 10
  • 4

  • 95%
  • 2,
  • 2000
  • 3-5
  • +/-0.5
  • (+/-0.05)
  • $1,500
  • "3 + 1"
  • 55834

  • 192.168.1.1
  • [192, 168]
  • (+15%),
ID-13
  • Filter: Number
  • InTerm: core-term.lc
  • Create link: ./inData/NRVAR
  • Param: 2, 13
  • Run time: 2 Min
24,515,4004,90724,510,49399.9800%99.3663%
  • and
  • two
  • one
  • first
  • three

  • first and second
  • one third
  • twenty-eight
  • NINE
  • zeroth and
  • Four hundred and forty-seven

  • zero-one
  • 'half'
  • One"
ID-14 24,510,493185,64024,324,85399.2426%98.6137%
  • of the
  • in the
  • to the
  • and the
  • on the

  • In the
  • and/or
  • 50% of
  • 1, 2, and
  • 2003 to
  • 2003 to 2007
  • for >=50%
  • the 8:2
  • -196 to -174

  • OR-462
  • AND-34
  • IN-1130
  • And-1
Apply Exclusive Filters - pattern
ID-20 24,324,853281,10624,043,74798.8444%97.4741%
  • tomography (CT)
  • imaging (MRI)
  • resonance imaging (MRI)
  • oxide (NO)
  • reaction (PCR)

  • chain reaction (PCR)
  • polymerase chain reaction (PCR)
  • magnetic resonance imaging (MRI)
  • computed tomography (CT)
  • enzyme-linked immunosorbent assay (ELISA)
  • single nucleotide polymorphisms (SNPs)
  • magnetic resonance (MR) imaging

  • "Standards, Options and Recommendations" (SOR)
  • (CREB)-binding protein (CBP)

  • kinase (ASK)
  • proline-rich polypeptide (PRP)
  • semi-permeable membrane devices (SPMDs)
ID-21 24,043,747437,16423,606,58398.1818%95.7018%
  • a significant
  • a single
  • a high
  • a novel
  • a case

  • a very
  • a group
  • a dose-dependent
  • A series
  • A and B
  • a meta-analysis
  • a SIF
  • A alpha C
  • A nonseminomatous

  • a delivery rate per
  • A beta 2m
  • a beta ab
ID-22 23,606,583156,89523,449,68899.3354%95.0657%
  • RESULTS:
  • METHODS:
  • CONCLUSIONS:
  • CONCLUSION:
  • BACKGROUND:

  • OBJECTIVE:
  • OBJECTIVE: To
  • MATERIALS AND METHODS:
  • SETTING:
  • PURPOSE: To
  • INTRODUCTION:
  • AIM: The
  • L: -DOPA
  • 95% PI:

  • PHPT:
  • months [95% CI:
  • vs N:
  • mode MIC:
  • [95 % CI:
ID-23 23,449,688170,55123,279,13799.2727%94.3743%
  • (n =
  • (P <
  • (P =
  • (p <
  • P <

  • (P < 0.05)
  • 95% CI =
  • P<0.001),
  • CI},
  • US$
  • VSL#3
  • N^N
  • group (n=6) received
  • CYP3A7*1C

  • studies; average
  • n.; Trichoteleia
  • sp. n.; Trichoteleia
ID-24 23,279,137389,96422,889,17398.3248%92.7934%
  • two groups
  • 6 months
  • 24 h
  • (ABSTRACT TRUNCATED AT 250 WORDS)
  • the two groups

  • 5 years
  • at 37 degrees
  • 3 times
  • 100 mg
  • January 1,
  • 10 mg/kg
  • 12-year-old
  • at -20 degrees C
  • September 2006
  • 65 years or older with
  • 20 cigarettes per day
  • 3 - 6 months

  • 6 hours plus
  • minutes) per day, 5 days
  • MMR + V
  • 3 mg/EE
  • 317615 x
ID-25 22,889,173209,97922,679,19499.0826%91.9421%
  • group (P
  • significant (P
  • years) with
  • significantly (P
  • years) and

  • interval [95%
  • see text] The
  • lt; 0.05) lower
  • CENTRAL) (The
  • nM (SD
  • pOGH (ANG

  • cB72.3(gamma
  • new species (type
Apply Exclusive Filters - Lead-End-Terms
ID-30 22,679,1945,944,44616,734,74873.7890%67.8432%
  • of a
  • that the
  • from the
  • is a
  • of this

  • The results
  • was observed
  • this study was
  • about 50%
  • - but not
  • "what is
  • AND COURSE

  • iT reg
  • of FoxM1b
  • or spinal or conduction
  • or spinal or conduction block,
ID-31 16,734,7483,447,54613,287,20279.3989%53.8667%
  • patients with
  • associated with
  • at the
  • suggest that
  • between the

  • in patients with
  • results suggest that
  • MATERIALS AND
  • cross-reacted with
  • (ST 36) and
  • Zusanli (ST 36) and
  • determine whether this could
  • primarily composed of the

  • tilt-in-space and
  • systems, assays and
  • ppm Cu as
  • epidural or spinal or
ID-32 13,287,2023,23313,283,96999.9757%53.8536%
  • in a
  • to be
  • with a
  • as a
  • may be

  • In a
  • in A.
  • For one
  • on NO

  • anti-NOR
  • plus AT
  • I/a
  • AS-ON
  • anti-OF
ID-33 13,283,9691,840,95111,443,01886.1416%46.3903%
  • to determine
  • In addition,
  • to evaluate
  • to assess
  • to investigate

  • in the presence
  • AT 250
  • As a result,
  • ON THE TREATMENT
  • as a possible treatment for
  • in details,
  • - for example,
  • within working memory
  • for various chronic
  • in 0.1% trifluoroacetic
  • in threatened preterm labor

  • with the MIC90S
  • On PTD
  • plus LHRH-A
  • with the MIC90S of
ID-34 11,443,0181,847,4129,595,60683.8556%38.9009%
  • effects of
  • number of
  • use of
  • presence of
  • used to

  • Comparison of
  • low cost of
  • HPV) in
  • NUMBER OF
  • zymography was used to
  • loss of two or more

  • 1 goes to
  • active with the MIC90s of
  • syn. nov. of
  • microg/mmol of
The final results of above is used as the distilled MEDLINE n-gram set
Apply Exclusive Filters - Project domain
ID-40 9,595,606851,8138,743,79391.1229%35.4476%
  • of
  • the
  • in
  • to
  • a

  • The
  • We
  • "The
  • linear,
  • "normal"
  • {systematic name:
  • systematic name
  • anterior intermeniscal ligament
  • regional low-flow perfusion

  • Neo.
  • Cannon &
  • Polycentropus
  • Penneys &
  • % (month