ID/Program | In No. | Filtered No. (%) | Out No. | Pass Rate | Acc. Pass Rate | Filter example and notes
|
---|
Generate the MEDLINE n-gram set
|
---|
Generate MEDLINE n-gram set | 6,725,249,089
- N=1: 38,591,469
- N=2: 345,555,538
- N=3: 1,189,180,839
- N=4: 2,233,423,690
- N=5: 2,918,497,553
| 6,695,158,318 (99.55%)
- N=1: 37,342,742 (96.76%)
- N=2: 338,036,116 (97.82%)
- N=3: 1,178,134,206 (99.07%)
- N=4: 2,226,281,606 (99.68%)
- N=5: 2,915,363,648 (99.55%)
| 30,090,771
- N=1: 1,248,727
- N=2: 7,519,422
- N=3: 11,046,633
- N=4: 7,142,084
- N=5: 3,133,905
| 0.36% | N/A
| From MEDLINE TI & AB to the MDELINE n-gram set
- filter out n-grams with length > 50
- filter out n-grams with word count < 30
- Calculated by Excel (manualy input In and Out No.)
- Used data from n-gram set log file (not distilled); key into speadsheet for calculation.
|
Basic operation: Sort nGrams by DC|WC|Terms
|
---|
ID-01- NGramFilter: SortNGramByDcWcTerm
- Param: 1, 01
- Run Time: 1 Min.
| 30,090,771 | 0 | 30,090,771 | 100.0000% | 100.0000% |
- Create link: ./05.ApplyFilters/nGram.${YEAR}
|
Apply General Exclusive Filters
|
---|
ID-10
| 30,090,771 | 28 | 30,090,743 | 99.9999% | 99.9999% |
- |
- (|r|
- ||
- Ag|AgCl
- |D|
- |E|
- lambda(||)
|
ID-11
| 30,090,743 | 740 | 30,090,003 | 99.9975% | 99.9974% |
- =
- <
- +/-
- >
- -
- -->
- (+)
- (%)
- "+"
- ((-/-))
- ==>
- [...]
- *}
- *//
|
ID-12
- Filter: Digit
- InTerm: core-term.lc
- Param: 2, 12
- Run time: 2 Min (norm - strip punc and space)
| 30,090,003 | 185,625 | 29,904,378 | 99.3831% | 99.3806% |
- 2
- 1
- 3
- 10
- 4
- 95%
- 2,
- 2000
- 3-5
- +/-0.5
- (+/-0.05)
- $1,500
- "3 + 1"
- 55834
- 192.168.1.1
- [192, 168]
- (+15%),
|
ID-13
- Filter: Number
- InTerm: core-term.lc
- Create link: ./inData/NRVAR
- Param: 2, 13
- Run time: 2 Min
| 29,904,378 | 5,444 | 29,898,934 | 99.9818% | 99.3625% |
- and
- two
- one
- first
- three
- first and second
- one third
- twenty-eight
- NINE
- zeroth and
- Four hundred and forty-seven
- zero-one
- 'half'
- One"
|
ID-14
| 29,898,934 | 219,435 | 29,679,499 | 99.2661% | 98.6332% |
- of the
- in the
- to the
- and the
- on the
- In the
- and/or
- 50% of
- 1, 2, and
- 2003 to
- 2003 to 2007
- for >=50%
- the 8:2
- -196 to -174
- OR-462
- AND-34
- IN-1130
- And-1
|
Apply Exclusive Filters - pattern
|
---|
ID-20
| 29,679,499 | 372,388 | 29,307,111 | 98.7453% | 97.3957% |
- tomography (CT)
- imaging (MRI)
- resonance imaging (MRI)
- oxide (NO)
- reaction (PCR)
- chain reaction (PCR)
- polymerase chain reaction (PCR)
- magnetic resonance imaging (MRI)
- computed tomography (CT)
- enzyme-linked immunosorbent assay (ELISA)
- single nucleotide polymorphisms (SNPs)
- magnetic resonance (MR) imaging
- "Standards, Options and Recommendations" (SOR)
- (CREB)-binding protein (CBP)
- kinase (ASK)
- proline-rich polypeptide (PRP)
- semi-permeable membrane devices (SPMDs)
|
ID-21
| 29,307,111 | 528,355 | 28,778,756 | 98.1972% | 95.6398% |
- a significant
- a single
- a high
- a novel
- a case
- a very
- a group
- a dose-dependent
- A series
- A and B
- a meta-analysis
- a SIF
- A alpha C
- A nonseminomatous
- a delivery rate per
- A beta 2m
- a beta ab
|
ID-22
| 28,778,756 | 195,840 | 28,582,916 | 99.3195% | 94.9890% |
- RESULTS:
- METHODS:
- CONCLUSIONS:
- CONCLUSION:
- BACKGROUND:
- OBJECTIVE:
- OBJECTIVE: To
- MATERIALS AND METHODS:
- SETTING:
- PURPOSE: To
- INTRODUCTION:
- AIM: The
- L: -DOPA
- 95% PI:
- PHPT:
- months [95% CI:
- vs N:
- mode MIC:
- [95 % CI:
|
ID-23
| 28,582,916 | 350,584 | 28,232,332 | 98.7734% | 93.8239% |
- °C
- ≥3
- (mean ±
- × 3
- (IC 95%;
- TGF-β
|
ID-24
| 28,232,332 | 0 | 28,232,332 | 100.0000% | 93.8239% |
- (n =
- (P <
- (P =
- (p <
- P <
- (P < 0.05)
- 95% CI =
- P<0.001),
- CI},
- US$
- VSL#3
- N^N
- group (n=6) received
- CYP3A7*1C
- studies; average
- n.; Trichoteleia
- sp. n.; Trichoteleia
|
ID-25
| 28,232,332 | 461,302 | 27,771,030 | 98.3661% | 92.2909% |
- two groups
- 6 months
- 24 h
- (ABSTRACT TRUNCATED AT 250 WORDS)
- the two groups
- 5 years
- at 37 degrees
- 3 times
- 100 mg
- January 1,
- 10 mg/kg
- 12-year-old
- at -20 degrees C
- September 2006
- 65 years or older with
- 20 cigarettes per day
- 3 - 6 months
- 6 hours plus
- minutes) per day, 5 days
- MMR + V
- 3 mg/EE
- 317615 x
|
ID-26
| 27,771,030 | 261,979 | 27,509,051 | 99.0566% | 91.4202% |
- group (P
- significant (P
- years) with
- significantly (P
- years) and
- interval [95%
- see text] The
- lt; 0.05) lower
- CENTRAL) (The
- nM (SD
- pOGH (ANG
- cB72.3(gamma
- new species (type
|
Apply Exclusive Filters - Lead-End-Terms
|
---|
ID-30
| 27,509,051 | 6,845,072 | 20,663,979 | 75.1170% | 68.6721% |
- of a
- that the
- from the
- is a
- of this
- The results
- was observed
- this study was
- about 50%
- - but not
- "what is
- AND COURSE
- iT reg
- of FoxM1b
- or spinal or conduction
- or spinal or conduction block,
|
ID-31
| 20,663,979 | 4,229,181 | 16,434,798 | 79.5336% | 54.6174% |
- patients with
- associated with
- at the
- suggest that
- between the
- in patients with
- results suggest that
- MATERIALS AND
- cross-reacted with
- (ST 36) and
- Zusanli (ST 36) and
- determine whether this could
- primarily composed of the
- tilt-in-space and
- systems, assays and
- ppm Cu as
- epidural or spinal or
|
ID-32
| 16,434,798 | 3,898 | 16,430,900 | 99.9763% | 54.6044% |
- in a
- to be
- with a
- as a
- may be
- In a
- in A.
- For one
- on NO
- anti-NOR
- plus AT
- I/a
- AS-ON
- anti-OF
|
ID-33
| 16,430,900 | 2,219,743 | 14,211,157 | 86.4904% | 47.2276% |
- to determine
- In addition,
- to evaluate
- to assess
- to investigate
- in the presence
- AT 250
- As a result,
- ON THE TREATMENT
- as a possible treatment for
- in details,
- - for example,
- within working memory
- for various chronic
- in 0.1% trifluoroacetic
- in threatened preterm labor
- with the MIC90S
- On PTD
- plus LHRH-A
- with the MIC90S of
|
ID-34
| 14,211,157 | 2,261,437 | 11,949,720 | 84.0869% | 39.7122% |
- effects of
- number of
- use of
- presence of
- used to
- Comparison of
- low cost of
- HPV) in
- NUMBER OF
- zymography was used to
- loss of two or more
- 1 goes to
- active with the MIC90s of
- syn. nov. of
- microg/mmol of
|
The final results of above is used as the distilled MEDLINE n-gram set
|
---|
Apply Exclusive Filters - Project domain
|
---|
ID-40
| 11,949,720 | 933,050 | 11,016,670 | 92.1919% | 36.6115% |
- of
- the
- in
- to
- a
- The
- We
- "The
- linear,
- "normal"
- {systematic name:
- systematic name
- anterior intermeniscal ligament
- regional low-flow perfusion
- Neo.
- Cannon &
- Polycentropus
- Penneys &
- % (month
|