| unigrams | n=1 | < 1.0 hr. (from ./bin/02.NGramGenAll/logData/${YEAR}/N-gram/*.log) |  
 | - param: 10,1, (150000000)
 - 45 min.
 from ./bin/Log.${YER}/02-NGramGen/log.heap.1.50:
  - Documents: 34,960,700
 - Sentences: 238,939,832
 - Tokens: 5,001,000,732
  - split: 1, no split 
 - 1-grams (not unique, from log.heap.1.50): 41,049,611
 (it is unique beacuse no split, use wc -l)
  - Files:
- nGram.out.1.heap.50.s01.0001-1166 (673 MB, use 
ls -alh)
  
- param:
	
 - 2 min.
	
 - Group Alphabetically
 - 1-gram (unique): 41,049,611
	
 - Files: 
- ${NGram}.g01.NO-NO (673 MB|41MB, from ./logData/${YEAR}/1-gram/11-1.log)
	
     
 | - param: 12, 1, 30
 - 1 min 
	
 - 1-gram (WC >= 30): 1,311,936
	
 - File: 
- 1-gram.${YEAR}.30 (22 MB)
  
- param: 13, 1, 30
 - 1 min. 
	
 - 1-gram (sorted): 1,311,936
	
 - File: 
- 1-gram.${YEAR}.30.dwt (22 MB)
  
| bigrams | n=2 | 3.5 hr. |  
 | - param: 10,2, (150000000)
 - 2.5 hr.
	
 - split: 4
 - 2-gram (not unique from log.heap.2.50): 477,925,459
	
 - Files:
	
	- s01.0001-0583 (3.1 GB) 
 - s02.0584-0890 (3.0 GB)
	
 - s03.0891-1140 (3.1 GB) 
 - s02.1141-1166 (537M) 
	
    
 | - param: see file names below
	
- 11,2,01,NO,M
 - 11,2,02,M,k
 - 11,2,02,k,NO
 
  - 50 min.
	
 - Group Alphabetically
 - 2-gram (unique, use wc -l): 365,653,326 
	
 - Files: 
- ${NGram}.g01.NO-M (2.0GB|106MB)
	
 - ${NGram}.g02.M-k (3.0GB|142MB)
	
 - ${NGram}.g03.k-NO (2.5GB|116MB)
 
   
 | - param: 12, 2, 30
 - 7 min. 
	
 - 2-gram (WC >= 30): 7,943,407
	
 - File:
- 2-gram.${YEAR}.30 (170 MB)
   
- param: 13, 2, 30
 - 2 min. 
	
 - 2-gram (sorted): 7,943,407
	
 - File:
- 2-gram.${YEAR}.30.dwt (170 MB)
  
| trigrams | n=3 | 13 hr. |  
 | - param: 10,3, (150000000)
 - 4.2 hr.
	
 - split: 14
 - 3-gram (not unique - from log.heap.3.50): 1,922,187,885
	
 - Files:
	
	- s01.0001-0151 (3.5 GB)
 - s02.0152-0309 (3.5 GB)
	
 - s03.0310-0400 (3.5 GB)
 - s04.0401-0532 (3.5 GB)
	
 - s05.0533-0614 (3.5 GB)
 - s06.0615-0692 (3.5 GB)
	
 - s07.0693-0760 (3.5 GB)
 - s08.0761-0826 (3.5 GB)
	
 - s09.0827-0890 (3.5 GB)
 - s10.0891-0954 (3.6 GB)
	
 - s11.0955-1014 (3.5 GB)
 - s12.1015-1073 (3.5 GB)
	
 - s13.1074-1131 (3.5 GB)
 - s14.1132-1166 (2.3 GB)
	
    
 | - param: see file names below
	
 - 8.0 hr.
	
 - Group Alphabetically 
 - 3-gram (unique wc -l): 1,260,314,546
	
 - Files:
	
	- g01.NO-E (4.1GB|175MB)
 - g02.E-Z (3.6GB|145MB)
	
 - g03.Z-c (3.9GB|151MB)
 - g04.c-f (4.0GB|146MB)
	
 - g05.f-j (3.7GB|140MB)
 - g06.j-o (2.5GB|93MB)
	
 - g07.o-r (3.5GB|133MB)
 - g08.r-th (3.2GB|120MB)
	
 - g09.th-NO (3.8GB|153MB)
	
    
 | - param: 12, 3, 30
 - 30 min. 
	
 - 3-gram (WC >= 30): 11,773,385
	
 - File: 
- 3-gram.${YEAR}.30 (303 MB)
  
- param: 13, 3, 30
 - 2 min. 
	
 - 3-gram (sorted): 11,773,385
	
 - File: 
- 3-gram.${YEAR}.30.dwt (303 MB)
  
	  
 | | fourgrams | n=4 | 31 hr. |  
 | - param: 10,4, (130000000)
 - 5.5 hr.
	
 - split: 25
	
 - 4-gram (not unique - from log.heap.4.50): 3,156,725,405
	
	  - Files:
	
	- s01.0001-0077 (4.0 GB) 
 - s02.0078-0204 (3.9 GB)
	
 - s03.0205-0272 (3.9 GB) 
 - s04.0273-0319 (3.9 GB)
	
 - s05.0320-0372 (4.0 GB) 
 - s06.0373-0419 (4.0 GB) 
	
 - s07.0420-0517 (4.0 GB) 
 - s08.0518-0559 (3.9 GB)
	
 - s09.0560-0605 (3.9 GB) 
 - s10.0606-0649 (4.0 GB)
	
 - s11.0650-0696 (4.0 GB) 
 - s12.0697-0735 (4.0 GB)
	
 - s13.0736-0773 (4.0 GB) 
 - s14.0774-0811 (4.0 GB)
	
 - s15.0812-0848 (4.0 GB) 
 - s16.0849-0885 (4.0 GB)
	
 - s17.0886-0921 (4.0 GB) 
 - s18.0922-0959 (4.1 GB)
	
 - s19.0960-0995 (4.1 GB) 
 - s20.0996-1030 (4.1 GB)
	
 - s21.1031-1065 (4.1 GB) 
 - s22.1066-1099 (4.0 GB)
	
 - s23.1100-1132 (4.0 GB) 
 - S24.1132-1166 (4.0 GB)
	
 - S25.1132-1166 (0.0)
	
  
	  
 | - param: see file names below
	
 - 25 hr.
	
 - Group Alphabetically
 - 4-gram (unique): 2,233,423,690
	
 - Files:
	
	- g01.NO-9 (4.1GB|148MB) 
 - g02.9-I (4.2GB|136MB)
	
 - g03.I-T (3.9GB|126MB)  
 - g04.T-am (4.4GB|143MB)
	
 - g05.am-at (4.3GB|141MB) 
 - g06.at-c (2.5GB|81MB)
	
 - g07.c-d (4.3GB|132MB)  
 - g08.d-es (3.9GB|119MB)
	
 - g09.es-h (4.5GB|142MB) 
 - g10.h-ini (4.2GB|132MB)
	
 - g11.ini-m (2.9GB|92MB) 
 - g12.m-o (4.0GB|124MB)
	
 - g13.o-p (4.4GB|150MB) 
 - g14.p-r (4.2GB|126MB)
	
 - g15.r-sh (3.6GB|113MB) 
 - g16.sh-th (3.8GB|119MB)
	
 - g17.th-to (3.9GB|130MB) 
 - g17.to-v (2.9GB|94MB) 
	
 - g19.v-NO (3.5GB|115MB)
	
  
	  
 | - param: 12, 4, 30
 - 30 min.
	
 - 4-gram (WC >= 30): 7,677,188
	
 - File:
- 4-gram.${YEAR}.30 (234 MB)
  
	  
 | - param: 13, 4, 30
 - 2 min. 
	
 - 4-gram (sorted): 7,677,188
	
 - File:
- 4-gram.${YEAR}.30.dwt (234 MB)
  
	  
 | 
|---|
 | fivegrams | n=5 | 37.5 hr. |  
 | - param: 10,5, (120000000)
 - 4.0 hr.
 - split: 30
 - 5-gram (not unique): 3,620,801,104
Files:
 
- s01.0001-0064 (4.3 GB)
 - s02.0065-0112 (4.3 GB)
 - s03.0113-0233 (4.3 GB)
 - s04.0234-0279 (4.3 GB)
 - s05.0280-0316 (4.4 GB)
 - s06.0317-0360 (4.4 GB)
 - s07.0361-0398 (4.4 GB)
 - s08.0399-0482 (4.4 GB)
 - s09.0483-0524 (4.4 GB)
 - s10.0525-0558 (4.4 GB)
 - s11.0559-0597 (4.4 GB)
 - s12.0598-0633 (4.4 GB)
 - s13.0634-0671 (4.4 GB)
 - s14.0672-0705 (4.4 GB)
 - s15.0706-0736 (4.4 GB)
 - s16.0737-0767 (4.4 GB)
 - s17.0768-0798 (4.4 GB)
 - s18.0799-0828 (4.4 GB)
 - s19.0829-0858 (4.4 GB)
 - s20.0859-0889 (4.5 GB)
 - s21.0890-0918 (4.4 GB)
 - s22.0919-0950 (4.5 GB)
 - s23.0951-0980 (4.5 GB)
 - s24.0981-1008 (4.4 GB)
 - s25.1009-1036 (4.4 GB)
 - s26.1037-1064 (4.5 GB)
 - s27.1065-1092 (4.5 GB)
 - s28.1093-1119 (4.4 GB)
 - s29.1120-1146 (4.4 GB)
 - s30.1147-1166 (3.2 GB)
  
  
 | - param: see file names below
 - 40.5 hr.
 - Group Alphabetically
 - 5-gram (unique): 3,108,592,548
 - Files:
	
	- g01.NO-4 (5.0GB|133MB) 
 - g02.4-D (4.3GB|121MB)
	
 - g03.D-M (3.9GB|105MB) 
 - g04.M-T (4.1GB|109MB)
	
 - g05.T-a (2.9GB|79MB) 
 - g06.a-am (4.3GB|118MB)
	
 - g07.am-ann (4.9GB|136MB) 
 - g08.ann-bo (4.8GB|131MB)
	
 - g09.bo-ch (3.1GB|84MB) 
 - g10.ch-ct (4.5GB|114MB)
	
 - g11.ct-ef (4.7GB|122MB) 
 - g12.ef-for (4.7GB|121MB)
	
 - g13.for-h (3.7GB|102MB) 
 - g14.h-in (3.2GB|82MB)
	
 - g15.in-io (4.9GB|134MB) 
 - g16.io-me (4.3GB|118MB)
	
 - g17.me-mm (1.7GB|44MB) 
 - g18.mm-of (3.6GB|96MB)
	
 - g19.of-og (4.7GB|139MB) 
 - g20.og-pl (4.6GB|124MB)
	
 - g21.pl-re (4.4GB|114MB) 
 - g22.re-s (3.4GB|89MB)
	
 - g23.s-st (3.8GB|99MB) 
 - g24.st-the (4.5GB|122MB) 
	
 - g25.the-then (4.7GB|132MB) 
 - g26.then-un (4.9GB|135MB) 
	
 - g27.un-w (2.3GB|60MB) 
 - g28.w-NO (4.7GB|133MB)
	
  
  
 | - param: 12, 5, 30
 - 1.2 hr. 
	
 - 5-gram (WC >= 30): 3,401,145
	
 - File: 
- 5-gram.${YEAR}.30 (121 MB) 
  
	  
 | - param: 13, 5, 30
 - 2 min.
	
 - 5-gram (sorted): 3,401,145
	
 - File: 
- 5-gram.${YEAR}.30.dwt (121 MB)
  
	  
 | 
|---|
  
  | 
|---|
  
  |  
  | 
|---|
  
  |  
  |  
  | 
|---|