Because of a lapse in government funding, the information on this website may not be up to date, transactions submitted via the website may not be processed, and the agency may not be able to respond to inquiries until appropriations are enacted. The NIH Clinical Center (the research hospital of NIH) is open. For more details about its operating status, please visit cc.nih.gov. Updates regarding government operating status and resumption of normal operations can be found at OPM.gov

CSpell

Configuration Setup

CSpell Java provides users choices of different set up options through the configuration file. The default configuration file is ${CSPELL_DIR}/data/Config/cSpell.properties. The variables used in the configuration file are the empirical best value and listed in the following table. "Relative path" refers to the path relative to cSpell top directory, ${CSPELL_DIR}.

I. Configuration Variables

Directories and Files (13)
Variable NamesDescriptionsVariable Values (Default)
CS_DIR the absolute path of the CSpell directory
  • CS_AUTO_MODE (use the current directory, must invoke CSpell at ${CSPELL_DIR})
  • /Projects/cSpell2018
  • d:/Projects/cSpell2018/
CS_INFORMAL_EXP_FILEthe relative path of the informal expression file
  • data/Misc/informalExpression.data
CS_CHECK_DIC_FILESthe relative path of the check dictionary file
  • data/Dictionary/check.dic
CS_SUGGEST_DIC_FILESthe relative path of the suggestion dictionary file
  • data/Dictionary/sugg.dic
  • data/Dictionary/check.dic
CS_SPLIT_WORD_DIC_FILESthe relative path of the split word dictionary file
  • data/Dictionary/split.dic
CS_MW_DIC_FILEthe relative path of the multiword dictionary file
  • data/Dictionary/lexicon.mw.dic
CS_UNIT_DIC_FILEthe relative path of the units file
  • data/Dictionary/unit.data
CS_SV_DIC_FILEthe relative path of the spelling variants dictionary file
  • data/Dictionary/sv.dic
CS_AA_DIC_FILEthe relative path of the abbreviation/acronym dictionary file
  • data/Dictionary/lexicon.aa.dic
CS_PN_DIC_FILEthe relative path of the proper noun dictionary file
  • data/Dictionary/lexicon.pn.dic
CS_FREQUENCY_FILEthe relative path of the word frequency file
  • data/Frequency/wcWord.data
CS_W2V_IM_FILEthe relative path of the word2Vec CBOW input matrix file
  • data/Context/syn0.data
CS_W2V_OM_FILEthe relative path of the word2Vec CBOW output matrix file
  • data/Context/syn1n.data

Modes Setup (2)
Variable NamesDescriptionsVariable Values (Default)
CS_FUNC_MODEFunctional mode
CS_RANK_MODERanking mode for non-word, 1-to-1 and Split

Detector Variables (5)
Variable NamesDescriptionsVariable Values (Default)
CS_MAX_LEGIT_TOKEN_LENGTHThe maximum length of a legit token for spelling detection and correction.
  • 30
CS_DETECTOR_RW_SPLIT_WORD_MIN_LENGTHThe minimum length for real-word split detection.
  • 4
CS_DETECTOR_RW_SPLIT_WORD_MIN_WCThe minimum word count (frequency) for real-word split detection.
  • 200
CS_DETECTOR_RW_1TO1_WORD_MIN_LENGTHThe minimum length for real-word 1-to-1 detection.
  • 2
CS_DETECTOR_RW_1TO1_WORD_MIN_WCThe minimum word count for real-word 1-to-1 detection.
  • 65

Candidate Generator Variables (17)
Variable NamesDescriptionsVariable Values (Default)
CS_CAN_MAX_CANDIDATE_NOThe maximum number of candidates.
  • 30
CS_CAN_ND_MAX_SPLIT_NOThe maximum number of non-dictionary splits.
  • 5
CS_CAN_NW_1TO1_WORD_MAX_LENGTHThe maximum length of word for non-word 1-to-1 correction.
  • 25
CS_CAN_NW_MAX_SPLIT_NOThe maximum number of splits for non-words.
  • 5
CS_CAN_NW_MAX_MERGE_NOThe maximum number of words to merge for non-words.
  • 2
CS_CAN_NW_MERGE_WITH_HYPHENBoolean flag for merging with hyphen for non-words.
  • true
CS_CAN_RW_1TO1_WORD_MAX_LENGTHThe maximum length of word for real-word 1-to-1 correction.
  • 10
CS_CAN_RW_MAX_SPLIT_NOThe maximum number of splits for real-words.
  • 2
CS_CAN_RW_MAX_MERGE_NOThe maximum number of words to merge for real-words.
  • 2
CS_CAN_RW_MERGE_WITH_HYPHENBoolean flag for merging with hyphen for real-words.
  • false
CS_CAN_RW_SHORT_SPLIT_WORD_LENGTHThe length of short split word for real-word split.
  • 3
CS_CAN_RW_MAX_SHORT_SPLIT_WORD_NOThe maximum number of short split word for real-word.
  • 2
CS_CAN_RW_MERGE_CAND_MIN_WCThe minimum word count for real-word merge candidates.
  • 15
CS_CAN_RW_SPLIT_CAND_MIN_WCThe minimum word count for real-word split candidates.
  • 200
CS_CAN_RW_1TO1_CAND_MIN_WCThe minimum word count for real-word 1-to-1 candidates.
  • 1
CS_CAN_RW_1TO1_CAND_MIN_LENGTHThe minimum length of real-word 1-to-1 candidates.
  • 2
CS_CAN_RW_1TO1_CAND_MAX_KEY_SIZEThe maximum size of keys in HashMap for real-word 1-to-1 candidates in memory.
  • 1,000,000,000 (default)
  • Max. theoretic value: 2**31-1 = 2,147,483,647
  • Empirical value: < 1,500,000,000

Ranker Variables (12)
Variable NamesDescriptionsVariable Values (Default)
CS_RANKER_NW_S1_RANK_RANGE_FACThe range factor of the top orthographic score for qualifying stage-2 ranking for non-word split/1-to-1.
  • 0.08
CS_RANKER_NW_S1_MIN_OSCOREThe minimum orthographic score for 1 candidate in stage-2 ranking for non-word split/1-to-1.
  • 2.70
CS_RANKER_RW_MERGE_C_FACThe confidence factor of context score for real-word merge.
  • 060
CS_RANKER_RW_SPLIT_C_FACThe confidence factor of context score for real-word split.
  • 0.01
CS_RANKER_RW_1TO1_C_FACThe confidence factor of context score for real-word 1-to-1.
  • 0.00
CS_RANKER_RW_1TO1_CAND_MIN_CSThe minimum context score of the top candidate for real-word 1-to-1.
  • 0.00
CS_RANKER_RW_1TO1_CAND_CS_DISTThe minimum distance of context score between the top candidate and the original token for real-word 1-to-1.
  • 0.085
CS_RANKER_RW_1TO1_CAND_CS_FACThe factor of context score between the top candidate and the original token for real-word 1-to-1.
  • 0.10
CS_RANKER_RW_1TO1_WORD_MIN_CSThe minimum context score of the original token for real-word 1-to-1.
  • -0.085
CS_RANKER_RW_1TO1_CAND_MIN_FSThe minimum frequency score of the original token for real-word 1-to-1.
  • 0.0006
CS_RANKER_RW_1TO1_CAND_FS_DISTThe minimum distance of frequency score between the top candidate and the original token for real-word 1-to-1.
  • 0.02
CS_RANKER_RW_1TO1_CAND_FS_FACThe factor of frequency score between the top candidate and the original token for real-word 1-to-1.
  • 0.035

Score Variables (3)
Variable NamesDescriptionsVariable Values (Default)
CS_ORTHO_SCORE_ED_DIST_FACWeighting factor of edit distance for orthographic score.
  • 1.00
CS_ORTHO_SCORE_PHONETIC_FACWeighting factor of phonetic for orthographic score.
  • 0.70
CS_ORTHO_SCORE_OVERLAP_FACWeighting factor of overlap for orthographic score.
  • 0.80

Context Setup Variables (7)
Variable NamesDescriptionsVariable Values (Default)
CS_W2V_SKIP_WORDA Boolean flag of skipping context words if have no word2Vec score.
  • true
CS_NW_1TO1_CONTEXT_RADIUSContext radius for non-word 1-to-1.
  • 2
CS_NW_SPLIT_CONTEXT_RADIUSContext radius for non-word split.
  • 2
  • Not used (CSpell combined non-word split and 1-to-1 in one model)
CS_NW_MERGE_CONTEXT_RADIUSContext radius for non-word merge.
  • 2
CS_RW_1TO1_CONTEXT_RADIUSContext radius for real-word 1-to-1.
  • 2
CS_RW_SPLIT_CONTEXT_RADIUSContext radius for real-word split.
  • 2
CS_RW_MERGE_CONTEXT_RADIUSContext radius for real-word merge.
  • 2

II. Syntax

  • # -- comment lines begin with "#".
  • variable=value: set variable to value

III. File Location

  • default: ${CSPELL_DIR}/data/Config/cSpell.properties
  • may be specified by option -x:config_file_absolute_path

Notes: The CSpell installation program generates ${CSPELL_DIR}/data/config/cSpell.properties automatically (from ${CSPELL_DIR}/data/Config/cSpell.properties.TEMPLATE) according to options users chose during the installation.