| unigrams | n=1 | < 1.0 hr. (from ./bin/02.NGramGenAll/logData/${YEAR}/N-gram/*.log) |  
 | - param: 10,1, (150000000)
 - 50 min.
 from ./bin/Log.${YER}/02-NGramGen/log.heap.1.50:
  - Documents: 38,201,553
 - Sentences: 270,098,242
 - Tokens: 5,676,864,905
  - split: 1, no split 
 - 1-grams (not unique, from log.heap.1.50): 46,147,938
 (it is unique beacuse no split, use wc -l)
  - Files:
- nGram.out.1.heap.50.s01.0001-1274 (760 MB, use 
ls -alh)
  
- param:
	
 - 2 min.
	
 - Group Alphabetically
 - 1-gram (unique): 46,147,938
	
 - Files: 
- ${NGram}.g01.NO-NO (760 MB|46MB, from ./logData/${YEAR}/1-gram/11-1.log)
	
     
 | - param: 12, 1, 30
 - 1 min 
	
 - 1-gram (WC >= 30): 1,441,038
	
 - File: 
- 1-gram.${YEAR}.30 (24 MB)
  
- param: 13, 1, 30
 - 1 min. 
	
 - 1-gram (sorted): 1,441,038
	
 - File: 
- 1-gram.${YEAR}.30.dwt (24 MB)
  
| bigrams | n=2 | 7.1 hr. |  
 | - param: 10,2, (150000000)
 - 3.0 hr.
	
 - split: 4
 - 2-gram (not unique from log.heap.2.50): 548,125,974
	
 - Files:
	
	- s01.0001-0583 (3.1 GB) 
 - s02.0584-0890 (3.0 GB)
	
 - s03.0891-1139 (3.0 GB) 
 - s02.1140-1274 (2.0 GB) 
	
    
 | - param: see file names below
	
- 11,2,01,NO,M
 - 11,2,02,M,k
 - 11,2,03,k,NO
 
  - 1.0 hr.
	
 - Group Alphabetically
 - 2-gram (unique, use wc -l): 407,154,719 
	
 - Files: 
- ${NGram}.g01.NO-M (2.3GB|120MB)
	
 - ${NGram}.g02.M-k (3.3GB|157MB)
	
 - ${NGram}.g03.k-NO (2.8GB|128MB)
 
   
 | - param: 12, 2, 30
 - 3.0 hr. 
	
 - 2-gram (WC >= 30): 8,825,402
	
 - File:
- 2-gram.${YEAR}.30 (189 MB)
   
- param: 13, 2, 30
 - 2 min. 
	
 - 2-gram (sorted): 8,825,402
	
 - File:
- 2-gram.${YEAR}.30.dwt (189 MB)
  
| trigrams | n=3 | 14.0 hr. |  
 | - param: 10,3, (150000000)
 - 4.5 hr.
	
 - split: 14
 - 3-gram (not unique - from log.heap.3.50): 2,190,857,442
	
 - Files:
	
	- s01.0001-0151 (3.5 GB)
 - s02.0152-0309 (3.5 GB)
	
 - s03.0310-0400 (3.5 GB)
 - s04.0401-0532 (3.5 GB)
	
 - s05.0533-0614 (3.5 GB)
 - s06.0615-0692 (3.5 GB)
	
 - s07.0693-0760 (3.5 GB)
 - s08.0761-0826 (3.5 GB)
	
 - s09.0827-0890 (3.5 GB)
 - s10.0891-0954 (3.6 GB)
	
 - s11.0955-1014 (3.5 GB)
 - s12.1015-1073 (3.5 GB)
	
 - s13.1074-1131 (3.6 GB)
 - s14.1132-1189 (3.6 GB)
	
 - s15.1190-1246 (3.6 GB)
 - s16.1247-1274 (2.0 GB)
	
    
 | - param: see file names below
	
 - 9.0 hr.
	
 - Group Alphabetically 
 - 3-gram (unique wc -l): 1,407,583,824
	
 - Files:
	
	- g01.NO-E (4.7GB|199MB)
 - g02.E-Z (4.1GB|165MB)
	
 - g03.Z-c (4.3GB|168MB)
 - g04.c-f (4.4GB|162MB)
	
 - g05.f-j (4.1GB|156MB)
 - g06.j-o (2.8GB|103MB)
	
 - g07.o-r (3.8GB|147MB)
 - g08.r-th (3.6GB|133MB)
	
 - g09.th-NO (4.3GB|170MB)
	
    
 | - param: 12, 3, 30
 - 0.5 hr. 
	
 - 3-gram (WC >= 30): 13,303,488
	
 - File: 
- 3-gram.${YEAR}.30 (344 MB)
  
- param: 13, 3, 30
 - 3 min. 
	
 - 3-gram (sorted): 13,303,488
	
 - File: 
- 3-gram.${YEAR}.30.dwt (344 MB)
  
	  
 | | fourgrams | n=4 | 31.5 hr. |  
 | - param: 10,4, (130000000)
 - 5.5 hr.
	
 - split: 25
	
 - 4-gram (not unique - from log.heap.4.50): 3,587,660,090
	
	  - Files:
	
	- s01.0001-0077 (4.0 GB) 
 - s02.0078-0204 (3.9 GB)
	
 - s03.0205-0272 (3.9 GB) 
 - s04.0273-0319 (3.9 GB)
	
 - s05.0320-0372 (4.0 GB) 
 - s06.0373-0419 (4.0 GB) 
	
 - s07.0420-0517 (4.0 GB) 
 - s08.0518-0559 (3.9 GB)
	
 - s09.0560-0605 (3.9 GB) 
 - s10.0606-0649 (4.0 GB)
	
 - s11.0650-0695 (3.9 GB) 
 - s12.0696-0734 (4.0 GB)
	
 - s13.0735-0772 (4.0 GB) 
 - s14.0773-0810 (4.0 GB)
	
 - s15.0811-0847 (4.0 GB) 
 - s16.0848-0884 (4.0 GB)
	
 - s17.0885-0920 (4.0 GB) 
 - s18.0921-0958 (4.1 GB)
	
 - s19.0959-0994 (4.1 GB) 
 - s20.0995-1029 (4.1 GB)
	
 - s21.1030-1064 (4.1 GB) 
 - s22.1065-1098 (4.0 GB)
	
 - s23.1099-1131 (4.0 GB) 
 - S24.1132-1165 (4.1 GB)
	
 - s25.1166-1199 (4.1 GB) 
 - S26.1200-1233 (4.1 GB)
	
 - s27.1234-1266 (4.1 GB) 
 - S28.1267-1274 (1.1 GB)
	
  
	  
 | - param: see file names below
	
 - 25 hr.
	
 - Group Alphabetically
 - 4-gram (unique): 2,664,791,419
	
 - Files:
	
	- g01.NO-8 (4.6GB|164MB) 
 - g02.8-H (4.4GB|144MB)
	
 - g03.H-S (4.2GB|135MB)  
 - g04.S-ad (4.4GB|143MB)
	
 - g05.ad-anl (4.6GB|151MB) 
 - g06.anl-c (4.3GB|139MB)
	
 - g07.c-d (4.9GB|148MB)  
 - g08.d-es (4.4GB|134MB)
	
 - g09.es-gm (4.6GB|145MB) 
 - g10.gm-ine (4.7GB|151MB)
	
 - g11.ine-m (3.7GB|115MB) 
 - g12.m-o (4.5GB|139MB)
	
 - g13.o-p (4.9GB|167MB) 
 - g14.p-r (4.7GB|142MB)
	
 - g15.r-sh (4.1GB|127MB) 
 - g16.sh-th (4.3GB|133MB)
	
 - g17.th-to (4.3GB|144MB) 
 - g17.to-w (4.1GB|134MB) 
	
 - g19.w-NO (3.0GB|101MB)
	
  
	  
 | - param: 12, 4, 30
 - 1.0 hr.
	
 - 4-gram (WC >= 30): 8,817,816
	
 - File:
- 4-gram.${YEAR}.30 (270 MB)
  
	  
 | - param: 13, 4, 30
 - 3 min. 
	
 - 4-gram (sorted): 8,817,816
	
 - File:
- 4-gram.${YEAR}.30.dwt (270 MB)
  
	  
 | 
|---|
 | fivegrams | n=5 | 44.7 hr. |  
 | - param: 10,5, (120000000)
 - 6.0 hr.
 - split: 30
 - 5-gram (not unique): 4,108,847,218
Files:
 
- s01.0001-0064 (4.3 GB)
 - s02.0065-0112 (4.3 GB)
 - s03.0113-0233 (4.3 GB)
 - s04.0234-0279 (4.3 GB)
 - s05.0280-0316 (4.4 GB)
 - s06.0317-0360 (4.4 GB)
 - s07.0361-0398 (4.4 GB)
 - s08.0399-0482 (4.4 GB)
 - s09.0483-0524 (4.4 GB)
 - s10.0525-0558 (4.4 GB)
 - s11.0559-0597 (4.4 GB)
 - s12.0598-0633 (4.4 GB)
 - s13.0634-0671 (4.4 GB)
 - s14.0672-0705 (4.4 GB)
 - s15.0706-0736 (4.4 GB)
 - s16.0737-0767 (4.4 GB)
 - s17.0768-0798 (4.4 GB)
 - s18.0799-0828 (4.4 GB)
 - s19.0829-0858 (4.4 GB)
 - s20.0859-0889 (4.5 GB)
 - s21.0890-0918 (4.4 GB)
 - s22.0919-0950 (4.5 GB)
 - s23.0951-0979 (4.4 GB)
 - s24.0980-1007 (4.4 GB)
 - s25.1008-1035 (4.4 GB)
 - s26.1036-1063 (4.5 GB)
 - s27.1064-1091 (4.5 GB)
 - s28.1092-1118 (4.4 GB)
 - s29.1119-1145 (4.4 GB)
 - s30.1146-1172 (4.4 GB)
 - s31.1173-1199 (4.4 GB)
 - s32.1200-1226 (4.4 GB)
 - s33.1227-1253 (4.6 GB)
 - s34.1254-1274 (3.5 GB)
  
  
 | - param: see file names below
 - 40.5 hr.
 - Group Alphabetically
 - 5-gram (unique): 3,504,818,804
 - Files:
	
	- g01.NO-2 (4.2GB|121MB) 
 - g02.2-C (4.3GB|127MB)
	
 - g03.C-I (4.3GB|115MB) 
 - g04.I-R (4.6GB|124MB)
	
 - g05.R-a (5.0GB|136MB) 
 - g06.a-an (5.1GB|139MB)
	
 - g07.an-ane (5.1GB|144MB) 
 - g08.ane-b (3.4GB|93)
	
 - g09.b-c (3.7GB|101MB) 
 - g10.c-com (3.6GB|95MB)
	
 - g11.com-d (3.7GB|94MB) 
 - g12.d-ef (4.9GB|127MB)
	
 - g13.ef-fol (5.1GB|132MB) 
 - g14.fol-h (4.4GB|120MB)
	
 - g15.h-in (3.6GB|93MB) 
 - g16.in-int (4.5GB|123MB)
	
 - g17.int-m (4.8GB|129MB) 
 - g18.m-n (4.7GB|123MB)
	
 - g19.n-of (2.5GB|64MB) 
 - g20.of-ofa (5.2GB|152MB)
	
 - g21.ofa-pl (5.3GB|142MB) 
 - g22.pl-re (5.0GB|129MB)
	
 - g23.re-s (3.9GB|101MB) 
 - g24.s-st (4.2GB|111MB) 
	
 - g25.st-the (5.1GB|137MB) 
 - g26.the-thea (5.0GB|142MB) 
	
 - g27.thea-toa (3.8GB|108MB) 
 - g28.toa-w (4.5GB|119MB)
	
 - g29.w-NO (5.3GB|150MB)
	
  
  
 | - param: 12, 5, 30
 - 1.0 hr. 
	
 - 5-gram (WC >= 30): 3,982,724
	
 - File: 
- 5-gram.${YEAR}.30 (142 MB) 
  
	  
 | - param: 13, 5, 30
 - 2 min.
	
 - 5-gram (sorted): 3,982,724
	
 - File: 
- 5-gram.${YEAR}.30.dwt (142 MB)
  
	  
 | 
|---|
  
  | 
|---|
  
  |  
  | 
|---|
  
  |  
  |  
  | 
|---|