ID/Program | In No. | Filtered No. (%) | Out No. | Pass Rate | Acc. Pass Rate | Filter example and notes
|
---|
Generate the MEDLINE n-gram set
|
---|
Generate MEDLINE n-gram set | 4,388,787,832
- N=1: 24,121,470
- N=2: 229,691,126
- N=3: 788,417,523
- N=4: 1,460,588,176
- N=5: 1,885,969,537
| 4,369,462,494 (99.56%)
- N=1: 23,238,183 (96.34%)
- N=2: 224,576,579 (97.77%)
- N=3: 781,282,716 (99.10%)
- N=4: 1,456,207,702 (99.70%)
- N=5: 1,884,157,314 (99.90%)
| 19,325,338
- N=1: 883,287
- N=2: 5,114,547
- N=3: 7,134,807
- N=4: 4,380,474
- N=5: 1,812,223
| 0.4403% | N/A
| From MEDLINE TI & AB to the MDELINE n-gram set
- filter out n-grams with length > 50
- filter out n-grams with word count < 30
- Calculated by Excel (manualy input In and Out No.)
|
Basic operation: Sort nGrams by DC|WC|Terms
|
---|
ID-01- NGramFilter: SortNGramByDcWcTerm
- Param: 1, 01
- Run Time: 1 Min.
| 19,325,338 | 0 | 19,325,338 | 100.0000% | 100.0000% |
- Create link: ./05.ApplyFilters/nGram.${YEAR}
|
Apply General Exclusive Filters
|
---|
ID-10
| 19,325,338 | 7 | 19,325,331 | 100.0000% | 100.0000% |
- 216|357||
- 58|75|(|r|
- 50|85|||
- 40|44|Ag|AgCl
- 34|47||D|
- 27|41||E|
- 16|38|lambda(||)
|
ID-11
| 19,325,331 | 425 | 19,324,906 | 99.9978% | 99.9978% |
- 1468508|4379867|=
- 830526|1804458|<
- 679245|2645584|+/-
- 275852|455400|>
- 206249|327350|-
- 11339|25445|-->
- 8168|15079|(+)
- 4721|5819|(%)
- 97|136|"+"
- 73|80|((-/-))
- 30|70|==>
- 39|48|[...]
- 10|33|*}
- 7|35|*//
|
ID-12
- Filter: Digit
- InTerm: core-term.lc
- Param: 2, 12
- Run time: 2 Min (norm - strip punc and space)
| 19,324,906 | 132,650 | 19,192,256 | 99.3136% | 99.3114% |
- 1573415|2318062|2
- 1557465|2348937|1
- 1249600|1739034|3
- 980112|1258360|10
- 952500|1257959|4
- 298055|643298|95%
- 202590|239870|2,
- 96654|114627|2000
- 13712|15014|3-5
- 1657|1714|+/-0.5
- 71|77|(+/-0.05)
- 56|58|$1,500
- 31|40|"3 + 1"
- 3|32|55834
- 192.168.1.1
- [192, 168]
- (+15%),
|
ID-13
- Filter: Number
- InTerm: core-term.lc
- Create link: ./inData/NRVAR
- Param: 2, 13
- Run time: 2 Min
| 19,192,256 | 4,326 | 19,187,930 | 99.9775% | 99.2890% |
- 17110403|101408848|and
- 2755149|3759973|two
- 2009841|2536674|one
- 1509215|1868858|first
- 1499457|1945263|three
- 20573|23342|first and second
- 20272|21359|one third
- 3383|3507|twenty-eight
- 81|81|NINE
- 40|42|zeroth and
- 36|36|Four hundred and forty-seven
- 27|34|zero-one
- 24|32|'half'
- 21|32|One"
|
ID-14
| 19,187,930 | 157,786 | 19,030,144 | 99.1777% | 98.4725% |
- 11350676|26564037|of the
- 9309757|18668800|in the
- 4600972|6314812|to the
- 4541887|5954054|and the
- 3553488|4655141|on the
- 1128696|1314980|In the
- 477963|547704|and/or
- 86317|90215|50% of
- 12642|14399|1, 2, and
- 11527|12363|2003 to
- 506|548|2003 to 2007
- 29|43|for >=50%
- 17|41|the 8:2
- 13|42|-196 to -174
- 12|32|OR-462
- 9|45|AND-34
- 8|44|IN-1130
- 8|31|And-1
|
Apply Exclusive Filters - pattern
|
---|
ID-20
| 19,030,144 | 197,022 | 18,833,122 | 98.9647% | 97.4530% |
- 43723|43924|tomography (CT)
- 41512|41818|imaging (MRI)
- 41087|41387|resonance imaging (MRI)
- 38888|39331|oxide (NO)
- 36558|36838|reaction (PCR)
- 36441|36720|chain reaction (PCR)
- 33100|33333|polymerase chain reaction (PCR)
- 34608|34807|magnetic resonance imaging (MRI)
- 32581|32691|computed tomography (CT)
- 18344|18650|enzyme-linked immunosorbent assay (ELISA)
- 11876|11950|single nucleotide polymorphisms (SNPs)
- 7501|7512|magnetic resonance (MR) imaging
- 57|57|"Standards, Options and Recommendations" (SOR)
- 58|58|(CREB)-binding protein (CBP)
- 24|31|kinase (ASK)
- 24|30|proline-rich polypeptide (PRP)
- 24|30|semi-permeable membrane devices (SPMDs)
|
ID-21
| 18,833,122 | 344,403 | 18,488,719 | 98.1713% | 95.6709% |
- 645982|727397|a significant
- 463569|531423|a single
- 380448|415096|a high
- 357730|410233|a novel
- 308795|335142|a case
- 137642|142602|a very
- 128611|141101|a group
- 85739|93650|a dose-dependent
- 45415|45699|A series
- 27489|35407|A and B
- 23369|28504|a meta-analysis
- 19|31|a SIF
- 17|37|A alpha C
- 17|30|A nonseminomatous
- 9|41|a delivery rate per
- 8|30|A beta 2m
- 6|33|a beta ab
|
ID-22
| 18,488,719 | 113,936 | 18,374,783 | 99.3838% | 95.0813% |
- 555284|2556344|RESULTS:
- 2018160|2018508|METHODS:
- 1485141|1485210|CONCLUSIONS:
- 1142888|1142951|CONCLUSION:
- 1037474|1037552|BACKGROUND:
- 841859|841956|OBJECTIVE:
- 470496|470522|OBJECTIVE: To
- 183998|184006|MATERIALS AND METHODS:
- 142817|142837|SETTING:
- 137418|137419|PURPOSE: To
- 137969|137977|INTRODUCTION:
- 24100|24101|AIM: The
- 16|51|L: -DOPA
- 19|56|(95% PI:
- 12|44|PHPT:
- 14|36|months [95% CI:
- 10|31|vs N:
- 9|45|(mode MIC:
- 7|33|% [95 % CI:
|
ID-23
| 18,374,783 | 135,508 | 18,239,275 | 99.2625% | 94.3801% |
- 377909|832392|(n =
- 305362|600169|(P <
- 203048|431029|(P =
- 200192|370403|(p <
- 187425|368477|P <
- 101928|158559|(P < 0.05)
- 16323|41573|95% CI =
- 5620|7573|(P<0.001),
- 472|473|{CI},
- 282|500|(US$
- 140|510|VSL#3
- 46|57|N^N
- 25|37|group (n=6) received
- 9|35|CYP3A7*1C
- 4|31|studies; average
- 1|37|n.; Trichoteleia
- 1|37|sp. n.; Trichoteleia
|
ID-24
| 18,239,275 | 336,112 | 17,903,163 | 98.1572% | 92.6409% |
- 177940|210038|two groups
- 142624|190341|6 months
- 125652|165359|24 h
- 106208|106208|(ABSTRACT TRUNCATED AT 250 WORDS)
- 96569|111790|the two groups
- 76377|96164|5 years
- 44885|53892|at 37 degrees
- 16786|18097|3 times
- 16270|20791|100 mg
- 15591|16755|January 1,
- 13116|16443|10 mg/kg
- 6637|7563|12-year-old
- 3199|3806|at -20 degrees C
- 1871|1896|September 2006
- 228|237|65 years or older with
- 194|220|20 cigarettes per day
- 54|63|3 - 6 months
- 7|33|6 hours plus
- 7|33|minutes) per day, 5 days
- 7|31|MMR + V
- 6|30|3 mg/EE
- 4|33|317615 x
|
ID-25
| 17,903,163 | 166,356 | 17,736,807 | 99.0708% | 91.7801% |
- 38680|53429|group (P
- 29553|33483|significant (P
- 28932|29667|years) with
- 28420|36452|significantly (P
- 27946|28749|years) and
- 2573|257|interval [95%
- 1776|1784|see text] The
- 1262|1390|< 0.05) lower
- 980|980|(CENTRAL) (The
- 8|32|nM (SD
- 6|33|pOGH (ANG
- 3|30|cB72.3(gamma
- 3|30|new species (type
|
Apply Exclusive Filters - Lead-End-Terms
|
---|
ID-30
| 17,736,807 | 4,712,162 | 13,024,645 | 73.4329% | 67.3967% |
- 3094375|3846276|of a
- 2579661|3150375|that the
- 2197093|2738637|from the
- 2059855|2318833|is a
- 1932233|2122291|of this
- 807264|849049|The results
- 619048|694688|was observed
- 523214|525272|this study was
- 13190|13456|about 50%
- 338|344|- but not
- 172|190|"what is
- 49|49|AND COURSE
- 5|31|iT reg
- 5|31|of FoxM1b
- 2|34|or spinal or conduction
- 2|32|or spinal or conduction block,
|
ID-31
| 13,024,645 | 2,710,470 | 10,314,175 | 79.1897% | 53.3713% |
- 2119774|4013252|patients with
- 1983358|2792617|associated with
- 1619127|2077643|at the
- 1256833|1304678|suggest that
- 1206088|1437403|between the
- 847370|1211964|in patients with
- 437150|440606|results suggest that
- 186025|186034|MATERIALS AND
- 3820|4157|cross-reacted with
- 143|174|(ST 36) and
- 80|93|Zusanli (ST 36) and
- 61|61|determine whether this could
- 38|38|primarily composed of the
- 5|41|tilt-in-space and
- 5|30|systems, assays and
- 4|38|ppm Cu as
- 3|35|epidural or spinal or
|
ID-32
| 10,314,175 | 2,687 | 10,311,488 | 99.9739% | 53.3573% |
- 2904059|3505586|in a
- 2592576|3105056|to be
- 2383010|2931652|with a
- 2002275|2347123|as a
- 1380216|1544681|may be
- 248186|258942|In a
- 8273|10744|in A.
- 1903|191|For one
- 1874|212|on NO
- 17|36|anti-NOR
- 17|36|plus AT
- 12|39|I/a
- 10|36|AS-ON
- 10|31|anti-OF
|
ID-33
| 10,311,488 | 1,450,394 | 8,861,094 | 85.9342% | 45.8522% |
- 754541|812741|to determine
- 616952|641216|In addition,
- 498080|526046|to evaluate
- 441062|475504|to assess
- 417985|428942|to investigate
- 387143|475128|in the presence
- 106242|106242|AT 250
- 42061|42399|As a result,
- 552|552|ON THE TREATMENT
- 255|256|as a possible treatment for
- 165|166|in details,
- 144|145|- for example,
- 71|78|within working memory
- 58|59|for various chronic
- 49|50|in 0.1% trifluoroacetic
- 28|34|in threatened preterm labor
- 5|38|with the MIC90S
- 5|32|On PTD
- 4|40|plus LHRH-A
- 4|35|with the MIC90S of
|
ID-34
| 8,861,094 | 1,458,246 | 7,402,848 | 83.5433% | 38.3064% |
- 1266941|1629133|effects of
- 1143004|1468193|number of
- 1119157|1384237|use of
- 1110577|1383390|presence of
- 1095221|1237508|used to
- 146690|149087|Comparison of
- 887|894|low cost of
- 848|857|(HPV) in
- 294|29|NUMBER OF
- 112|113|zymography was used to
- 45|45|loss of two or more
- 6|33|1 goes to
- 5|35|active with the MIC90s of
- 5|32|syn. nov. of
- 3|37|microg/mmol of
|
The final results of above is used as the distilled MEDLINE n-gram set
|
---|
Apply Exclusive Filters - Project domain
|
---|
ID-40
| 7,402,848 | 714,896 | 6,687,952 | 90.3430% | 34.6072% |
- 19606267|140460643|of
- 17471706|134983756|the
- 17200048|79351393|in
- 14143686|51239396|to
- 13841799|43070808|a
- 11720760|24982979|The
- 3580301|487296|We
- 8691|9239|"The
- 6254|6529|linear,
- 6001|6796|"normal"
- 179|185|{systematic name:
- 56|56|systematic name
- 10|33|anterior intermeniscal ligament
- 9|30|regional low-flow perfusion
- 2|32|Neo.
- 2|31|Cannon &
- 2|30|Polycentropus
- 1|62|Penneys &
- 1|30|% (month
|