Because of a lapse in government funding, the information on this website may not be up to date, transactions submitted via the website may not be processed, and the agency may not be able to respond to inquiries until appropriations are enacted. The NIH Clinical Center (the research hospital of NIH) is open. For more details about its operating status, please visit cc.nih.gov. Updates regarding government operating status and resumption of normal operations can be found at OPM.gov.
Validate and Fix LEXICON
shell> cp -p LEXICON ${LEXICON}/data/${YEAR}/data/LEXICON.mmddyy
	shell> cd ${LEXICON}/data/${YEAR}/data
	shell> ln -sf ./LEXICON.mmddyy LEXICON.freeze
	
	
shell> fgrep "  " LEXICON.freeze | wc -l
	
=> should be 0, all extra space is taken care of in LexBuild automatically
	
If not, need to have data in LexBuild fixed as well
	
	
shell> ${LEXICON}/bin/1.FinalizeLexicon <year>
		
mv ./LEXICON.release LEXICON.release.log.1.noAnno
			ln -sf ./LEXICON.release.log.1.noAnno LEXICON.release
			shell> ${LEXICON}/bin/2.ValidateLexicon <year> > log.2
	shel>fgrep "entry=" LEXICON.release > Euis 
		
shell> cd ${LEX_CHECK_PROC}/data/GetFiles
		shell> cp -p ${LEXICON}/data/${YEAR}/data/LEXICON.release.log.1.noAnno LEXICON.release.log.1.noAnno.${YEAR}
			
			
shell> cd ${LEX_CHECK_PROC}/bin
			
shell> GetFilesFromLexicon
			
2 (prepositions)
			
3 (particles)
			
12
			
13
			
			
		=> Use ./LEXICON.release.2.fixContent for the next steps (if it is different from the input)
		ln -sf ./LEXICON.release.log.2.3.contentFix LEXICON.release
		
| Year | DupRec | N | C | Notes | 
|---|---|---|---|---|
| 2014 | 137 | 69 | 68 | Only multiword (137/1184) are tagged due to limited resource and due date. The rest (abbreviations or acronyms) are updated in the next release. | 
| 2015 | 1183 | 1042 | 141 | Changes are updated in LB and fixed for next release | 
| 2016 | 67 | 62 | 5 | Changes are updated in LB and fixed for next release | 
| 2017 | 69 | 63 | 6 | Changes are updated in LB and fixed for next release | 
| 2018 | 55 | 48 | 7 | Changes are updated in LB and fixed for next release | 
| 2019 | 11 | 6 | 5 | Changes are updated in LB and fixed for next release | 
| 2020 | 3 | 0 | 3 | Changes are updated in LB and this release | 
| 2021 | 3 | 0 | 3 | Changes are updated in LB and this release | 
| 2022 | 12 | 3 | 9 | Changes are updated in LB and this release | 
| 2023 | 3 | 2 | 1 | Changes are updated in LB and this release | 
| 2024 | 2 | 2 | 0 | Changes are updated in LB and this release | 
| 2025 | 1 | 0 | 1 | Changes are updated in LB and this release | 
| 2026 | 1 | 1 | 0 | Changes are updated in LB and this release | 
shell>fgrep " no EUI (" log.2 > 2.4.03.noEui
				| Year | no EUI No. | notBaseForm No. | 
|---|---|---|
| 2017 | 22 | 4 | 
| 2018 | 4 | 2 | 
| 2019 | 63 | 0 | 
| 2020 | 61 | 0 | 
| 2021 | 34 | 0 | 
| 2022 | 18 | 0 | 
| 2023 | 0 | 0 | 
| 2024 | 0 | 0 | 
| 2025 | 0 | 0 | 
| 2026 | 1 | 0 | 
shell>fgrep " wrong citation (spVar) (" log.2 |fgrep -v " wrong citation (spVar), duplicates (" > 2.4.04.wrongCitSpVar
				| Year | wrong citation (spVar) No. | 
|---|---|
| 2017 | 71 | 
| 2018 | 0 | 
| 2019 | 59 | 
| 2020 | 0 | 
| 2021 | 1 | 
| 2022 | 0 | 
| 2023 | 0 | 
| 2024 | 0 | 
| 2025 | 0 | 
| 2026 | 0 | 
shell>fgrep " wrong citation (spVar), duplicates (" log.2 > 2.5.wrongCitSpVarDup
				| Year | wrong citation (spVar), duplictes No. | 
|---|---|
| 2017 | 12 | 
| 2018 | 0 | 
| 2019 | 2 | 
| 2020 | 1 | 
| 2021 | 6 | 
| 2022 | 2 | 
| 2023 | 9 | 
| 2024 | 20 | 
| 2025 | 11 | 
| 2026 | 0 | 
Steps 3, 4, 5 are auto-fixed at the same time when run the validataion program. So, use the LEXICON.release.3.fixCrossCheck as LEXICON.release (link) and rerun
		shell> cp -p ./LEXICON.release.3.fixCrossCheck Lexicon.release.3.fixCrossCheck.2.5.cit
		shell> ln -sf ./LEXICON.release.log.${No}.fixCrossRed Lexicon.release
		
		rerun 2.ValidateLexicon ${YEAR} > log.2
		
Please make sure check everything to make sure everything is OK because the auto-fix in different steps might cause new issuess. Such as add EUI and causes duplicates. Rerun this until no error found!
			
shell>fgrep "missing EUI (" log.2 > 2.6.missingEui
				
				
=> use LEXICON.release.3.fixCrossCheck and rerun
				shell> cp -r LEXICON.release.3.fixCrossCheck Lexicon.release.log.${no}.missEuiFix 
				shell> ln -sf ./LEXICON.release.log.${no}.missEuiFix Lexicon.release
				
Save LEXICON.release.3.fixCrossCheck as LEXICON.release.log.${No}.misEuiFix (link to Lexicon.release) and rerun this step
			
shell> fgrep "wrong EUI" log.2 > 2.4.7.wrongEui.nom
				shell> cp -p LEXICON.release.3.fixCrossCheck Lexicon.release.log.${No}.wrongEuiFix
				shell> ln -sf ./LEXICON.release.log.${No}.wrongEuiFix Lexicon.release
				nominalization and nominalization_of.
				shell> fgrep " symmetric none @ [" log.2 > 2.12.symNone
				
shell> fgrep " new EUI (" log.2 > 2.4.13.fixCrossRef-newEui
				shell> fgrep "nominalizations - new EUI (" log.2 > 2.13.newEui.nom
					
shell> fgrep "acronyms - new EUI (" log.2 > 2.13.newEui.acr
				shell> fgrep "abbreviations - new EUI (" log.2 > 2.13.newEui.abb
					Post-Procedures:
(This is the post-process that need to be done for current release, before the next release)
Ideally, LEXICON.release should be identical to LEXICON.release.3.fixCrossCheck
				
> non-ascii char|U+value|EUI1|tag
				
action: check to replace non-ASCII with ASCII char
				
tag
				
					
| Name | Letter 1 | Letter 2 (Illegal non-ASCII) | Notes | 
|---|---|---|---|
| postrophe | [']-(APOSTROPHE, U+0027) | [‘]-(LEFT SINGLE QUOTATION MARK, U+2018) | Replace illegal non-ASCII | 
| [’]-(RIGHT SINGLE QUOTATION MARK, U+2019)
		 => accepted after 2021+ release  | |||
| [ʼ]-(MODIFIER LETTER APOSTROPHE, U+02BC) | |||
| hyphen | [-]-(HYPHEN-MINUS, U+002D) | [‑]-(NON-BREAKING HYPHEN, U+2011) | Replace illegal non-ASCII
		 => accepted after 2021+ release  | 
| [–]-(EN DASH, U+2013) | |||
| beta | [β]-(GREEK SMALL LETTER BETA, U+03B2) | [ß]-(LATIN SMALL LETTER SHARP S, U+00DF) | Replace illegal non-ASCII | 
| mu/micro | [μ]-(GREEK SMALL LETTER MU, U+03BC) | [µ]-(MICRO SIGN, U+00B5) | Both could be legal. Check the records to make sure the right chars are used. | 
| Y/EPSILON | [Y]-(LATIN CAPITAL LETTER Y, U+0059) | [Υ]-(GREEK CAPITAL LETTER UPSILON, U+03A5) | Both could be legal. Check the records to make sure the right chars are used. | 
	
shell> ${LEXICON}/bin/2.ValidateLexicon <year> > log.2
	
	
	shell> ${LEXICON}/bin/2.ValidateLexicon <year> > log.2
	
	
Completed: Clean up files and logs: move all logs and files to ./${year}