| unigrams | n=1 | < 1.0 hr. (from ./bin/02.NGramGenAll/logData/${YEAR}/N-gram/*.log) |  
 | - param: 10,1, (150000000)
 - 45 min.
 from log.heap.1.50:
  - Documents: 33,405,863
 - Sentences: 224,228,682
 - Tokens: 4,680,725,429
  - split: 1, no split 
 - 1-grams (not unique, from log.heap.1.50): 38,591,469
 (it is unique beacuse no split, use wc -l)
  - Files:
- nGram.out.1.heap.50.s01.0001-1114 (631 MB, use 
ls -alh)
  
- param:
	
 - 2 min.
	
 - Group Alphabetically
 - 1-gram (unique): 38,591,469
	
 - Files: 
- ${NGram}.g01.NO-NO (631 MB|39MB, from ./logData/${YEAR}/1-gram/11-1.log)
	
     
 | - param: 12, 1, 30
 - 1 min 
	
 - 1-gram (WC >= 30): 1,248,727
	
 - File: 
- 1-gram.${YEAR}.30 (21 MB)
  
- param: 13, 1, 30
 - 1 min. 
	
 - 1-gram (sorted): 1,248,727
	
 - File: 
- 1-gram.${YEAR}.30.dwt (21 MB)
  
| bigrams | n=2 | 3.2 hr. |  
 | - param: 10,2, (150000000)
 - 2.3 hr.
	
 - split: 3
 - 2-gram (not unique from log.heap.2.50): 438,120,386
	
 - Files:
	
	- s01.0001-0583 (3.1 GB) 
 - s02.0584-0890 (3.0 GB)
	
 - s03.0891-1114 (2.8 GB) 
	
    
 | - param: see file names below
	
- 11,2,01,NO,M
 - 11,2,02,M,k
 - 11,2,02,k,NO
 
  - 50 min.
	
 - Group Alphabetically
 - 2-gram (unique, use wc -l): 345,555,538
	
 - Files: 
- ${NGram}.g01.NO-M (1.9GB|100MB)
	
 - ${NGram}.g02.M-k (2.8GB|135MB)
	
 - ${NGram}.g03.k-NO (2.4GB|111MB)
 
   
 | - param: 12, 2, 30
 - 7 min. 
	
 - 2-gram (WC >= 30): 7,519,422
	
 - File:
- 2-gram.${YEAR}.30 (161 MB)
   
- param: 13, 2, 30
 - 2 min. 
	
 - 2-gram (sorted): 7,519,422
	
 - File:
- 2-gram.${YEAR}.30.dwt (161 MB)
  
| trigrams | n=3 | 16 hr. |  
 | - param: 10,3, (150000000)
 - 5.8 hr.
	
 - split: 12
 - 3-gram (not unique - from log.heap.3.50): 1,643,134,783
	
 - Files:
	
	- s01.0001-0209 (3.8 GB)
 - s02.0210-0324 (3.7 GB)
	
 - s03.0325-0422 (3.8 GB)
 - s04.0423-0559 (3.8 GB)
	
 - s05.0560-0646 (3.7 GB)
 - s06.0647-0728 (3.8 GB)
	
 - s07.0729-0800 (3.7 GB)
 - s08.0801-0870 (3.8 GB)
	
 - s09.0871-0938 (3.8 GB)
 - s10.0939-1005 (3.8 GB)
	
 - s11.1006-1069 (3.8 GB)
 - s12.1070-1114 (2.8 GB)
	
    
 | - param: see file names below
	
 - 10.0 hr.
	
 - Group Alphabetically 
 - 3-gram (unique): 1,189,180,839
	
 - Files:
	
	- g01.NO-G (4.2GB|178MB)
 - g02.G-Z (3.0GB|121MB)
	
 - g03.Z-c (3.7GB|143MB)
 - g04.c-f (3.8GB|138MB)
	
 - g05.f-j (3.5GB|133MB)
 - g06.j-n (1.8GB|68MB)
	
 - g07.n-r (3.8GB|146MB)
 - g08.r-th (3.0GB|113MB)
	
 - g09.th-NO (3.6GB|145MB)
	
    
 | - param: 12, 3, 30
 - 20 min. 
	
 - 3-gram (WC >= 30): 11,046,633
	
 - File: 
- 3-gram.${YEAR}.30 (284 MB)
  
- param: 13, 3, 30
 - 2 min. 
	
 - 3-gram (sorted): 10,046,633
	
 - File: 
- 3-gram.${YEAR}.30.dwt (284 MB)
  
	  
 | | fourgrams | n=4 | 25 hr. |  
 | - param: 10,4, (130000000)
 - 5.5 hr.
	
 - split: 23
	
 - 4-gram (not unique - from log.heap.4.50): 2,956,113,231
	
	  - Files:
	
	- s01.0001-0077 (4.0 GB) 
 - s02.0078-0204 (3.9 GB)
	
 - s03.0205-0272 (3.9 GB) 
 - s04.0273-0319 (3.9 GB)
	
 - s05.0320-0372 (4.0 GB) 
 - s06.0373-0419 (4.0 GB) 
	
 - s07.0420-0517 (4.0 GB) 
 - s08.0518-0559 (3.9 GB)
	
 - s09.0560-0605 (3.9 GB) 
 - s10.0606-0649 (4.0 GB)
	
 - s11.0650-0696 (4.0 GB) 
 - s12.0697-0735 (4.0 GB)
	
 - s13.0736-0773 (4.0 GB) 
 - s14.0774-0811 (4.0 GB)
	
 - s15.0812-0848 (4.0 GB) 
 - s16.0849-0885 (4.0 GB)
	
 - s17.0886-0921 (4.0 GB) 
 - s18.0922-0959 (4.1 GB)
	
 - s19.0960-0995 (4.1 GB) 
 - s20.0996-1030 (4.1 GB)
	
 - s21.1031-1065 (4.1 GB) 
 - s22.1066-1099 (4.0 GB)
	
 - s23.1100-1114 (1.9 GB)
	
  
	  
 | - param: see file names below
	
 - 18.7 hr.
	
 - Group Alphabetically
 - 4-gram (unique): 2,233,423,690
	
 - Files:
	
	- g01.NO-9 (3.9GB|138MB) 
 - g02.9-L (4.5GB|149MB)
	
 - g03.L-Z (4.4GB|143MB) 
 - g04.Z-ane (5.3GB|175MB)
	
 - g05.ane-bs (3.2GB|102MB) 
 - g06.bs-d (4.6GB|143MB)
	
 - g07.d-f (4.6GB|142MB) 
 - g08.f-h (3.2GB|104MB)
	
 - g09.h-ini (3.9GB|125MB) 
 - g10.ini-m (2.8GB|87MB)
	
 - g11.m-o (3.7GB|117MB) 
 - g12.o-p (4.1GB|142MB)
	
 - g13.p-r (3.9GB|119MB) 
 - g14.r-sh (3.4GB|106MB)
	
 - g15.sh-th (3.6GB|112MB) 
 - g16.th-to (3.7GB|123MB)
	
 - g17.to-v (2.7GB|89MB) 
 - g18.v-NO (3.3GB|108MB)
	
  
	  
 | - param: 12, 4, 30
 - 52 min.
	
 - 4-gram (WC >= 30): 7,142,084
	
 - File:
- 4-gram.${YEAR}.30 (217 MB)
  
	  
 | - param: 13, 4, 30
 - 2 min. 
	
 - 4-gram (sorted): 7,142,084
	
 - File:
- 4-gram.${YEAR}.30.dwt (217 MB)
  
	  
 | 
|---|
 | fivegrams | n=5 | 36.7 hr. |  
 | - param: 10,5, (120000000)
 - 6.0 hr.
 - split: 28
 - 5-gram (not unique): 3,388,345,780
Files:
 
- s01.0001-0064 (4.3 GB)
 - s02.0065-0112 (4.3 GB)
 - s03.0113-0233 (4.3 GB)
 - s04.0234-0279 (4.3 GB)
 - s05.0280-0316 (4.4 GB)
 - s06.0317-0360 (4.4 GB)
 - s07.0361-0398 (4.4 GB)
 - s08.0399-0482 (4.4 GB)
 - s09.0483-0524 (4.4 GB)
 - s10.0525-0558 (4.4 GB)
 - s11.0559-0597 (4.4 GB)
 - s12.0598-0633 (4.4 GB)
 - s13.0634-0671 (4.4 GB)
 - s14.0672-0705 (4.4 GB)
 - s15.0706-0736 (4.4 GB)
 - s16.0737-0767 (4.4 GB)
 - s17.0768-0798 (4.4 GB)
 - s18.0799-0828 (4.4 GB)
 - s19.0829-0858 (4.4 GB)
 - s20.0859-0889 (4.5 GB)
 - s21.0890-0918 (4.4 GB)
 - s22.0919-0950 (4.5 GB)
 - s23.0951-0979 (4.4 GB)
 - s24.0980-1007 (4.4 GB)
 - s25.1008-1035 (4.4 GB)
 - s26.1036-1063 (4.5 GB)
 - s27.1064-1091 (4.5 GB)
 - s28.1092-1114 (3.7 GB)
  
  
 | - param: see file names below
 - 29.4 hr.
 - Group Alphabetically
 - 5-gram (unique): 2,918,497,553
 - Files:
	
	- g01.NO-9 (5.0GB|148MB) 
 - g02.9-E (3.7GB|101MB)
	
 - g03.E-N (3.8GB|104MB) 
 - g04.N-T (3.1GB|81MB)
	
 - g05.T-a (2.7GB|74MB) 
 - g06.a-am (4.1GB|111MB)
	
 - g07.am-ann (4.5GB|127MB) 
 - g08.ann-bo (4.6GB|123MB)
	
 - g09.bo-ch (2.9GB|79MB) 
 - g10.ch-ct (4.2GB|107MB)
	
 - g11.ct-ef (4.4GB|114MB) 
 - g12.ef-fr (5.6GB|149MB)
	
 - g13.fr-i (4.4GB|117MB) 
 - g14.i-inc (2.9GB|81MB)
	
 - g15.inc-io (2.7GB|67MB) 
 - g16.io-m (3.7GB|85MB)
	
 - g17.m-o (5.6GB|148MB) 
 - g18.o-off (4.8GB|139MB)
	
 - g19.off-pl (4.4GB|118MB) 
 - g20.pl-re (4.2GB|107MB)
	
 - g21.re-s (3.2GB|84MB) 
 - g22.s-st (3.5GB|93MB)
	
 - g23.st-the (4.3GB|114MB) 
 - g24.the-then (4.4GB|124MB)
	
 - g25.then-un (4.6GB|127MB) 
 - g26.un-w (2.1GB|56MB)
	
 - g27.w-NO (4.4GB|125MB)
	
  
  
 | - param: 12, 5, 30
 - 1.2 hr. 
	
 - 5-gram (WC >= 30): 3,133,905
	
 - File: 
- 5-gram.${YEAR}.30 (111 MB) 
  
	  
 | - param: 13, 5, 30
 - 2 min.
	
 - 5-gram (sorted): 3,133,905
	
 - File: 
- 5-gram.${YEAR}.30.dwt (111 MB)
  
	  
 | 
|---|
  
  | 
|---|
  
  |  
  | 
|---|
  
  |  
  |  
  | 
|---|