CSpell

Configuration Setup

CSpell Java provides users choices of different set up options through the configuration file. The default configuration file is ${CSPELL_DIR}/data/Config/cSpell.properties. The variables used in the configuration file are the empirical best value and listed in the following table. "Relative path" refers to the path relative to cSpell top directory, ${CSPELL_DIR}.

I. Configuration Variables

Directories and Files (13)
Variable NamesDescriptionsVariable Values (Default)
CS_DIR the absolute path of the CSpell directory
  • CS_AUTO_MODE (use the current directory, must invoke CSpell at ${CSPELL_DIR})
  • /Projects/cSpell2018
  • d:/Projects/cSpell2018/
CS_INFORMAL_EXP_FILEthe relative path of the informal expression file
  • data/Misc/informalExpression.data
CS_CHECK_DIC_FILESthe relative path of the check dictionary file
  • data/Dictionary/check.dic
CS_SUGGEST_DIC_FILESthe relative path of the suggestion dictionary file
  • data/Dictionary/sugg.dic
  • data/Dictionary/check.dic
CS_SPLIT_WORD_DIC_FILESthe relative path of the split word dictionary file
  • data/Dictionary/split.dic
CS_MW_DIC_FILEthe relative path of the multiword dictionary file
  • data/Dictionary/lexicon.mw.dic
CS_UNIT_DIC_FILEthe relative path of the units file
  • data/Dictionary/unit.data
CS_SV_DIC_FILEthe relative path of the spelling variants dictionary file
  • data/Dictionary/sv.dic
CS_AA_DIC_FILEthe relative path of the abbreviation/acronym dictionary file
  • data/Dictionary/lexicon.aa.dic
CS_PN_DIC_FILEthe relative path of the proper noun dictionary file
  • data/Dictionary/lexicon.pn.dic
CS_FREQUENCY_FILEthe relative path of the word frequency file
  • data/Frequency/wcWord.data
CS_W2V_IM_FILEthe relative path of the word2Vec CBOW input matrix file
  • data/Context/syn0.data
CS_W2V_OM_FILEthe relative path of the word2Vec CBOW output matrix file
  • data/Context/syn1n.data

Modes Setup (2)
Variable NamesDescriptionsVariable Values (Default)
CS_FUNC_MODEFunctional mode
CS_RANK_MODERanking mode for non-word, 1-to-1 and Split

Detector Variables (5)
Variable NamesDescriptionsVariable Values (Default)
CS_MAX_LEGIT_TOKEN_LENGTHThe maximum length of a legit token for spelling detection and correction.
  • 30
CS_DETECTOR_RW_SPLIT_WORD_MIN_LENGTHThe minimum length for real-word split detection.
  • 4
CS_DETECTOR_RW_SPLIT_WORD_MIN_WCThe minimum word count (frequency) for real-word split detection.
  • 200
CS_DETECTOR_RW_1TO1_WORD_MIN_LENGTHThe minimum length for real-word 1-to-1 detection.
  • 2
CS_DETECTOR_RW_1TO1_WORD_MIN_WCThe minimum word count for real-word 1-to-1 detection.
  • 65

Candidate Generator Variables (17)
Variable NamesDescriptionsVariable Values (Default)
CS_CAN_MAX_CANDIDATE_NOThe maximum number of candidates.
  • 30
CS_CAN_ND_MAX_SPLIT_NOThe maximum number of non-dictionary splits.
  • 5
CS_CAN_NW_1TO1_WORD_MAX_LENGTHThe maximum length of word for non-word 1-to-1 correction.
  • 25
CS_CAN_NW_MAX_SPLIT_NOThe maximum number of splits for non-words.
  • 5
CS_CAN_NW_MAX_MERGE_NOThe maximum number of words to merge for non-words.
  • 2
CS_CAN_NW_MERGE_WITH_HYPHENBoolean flag for merging with hyphen for non-words.
  • true
CS_CAN_RW_1TO1_WORD_MAX_LENGTHThe maximum length of word for real-word 1-to-1 correction.
  • 10
CS_CAN_RW_MAX_SPLIT_NOThe maximum number of splits for real-words.
  • 2
CS_CAN_RW_MAX_MERGE_NOThe maximum number of words to merge for real-words.
  • 2
CS_CAN_RW_MERGE_WITH_HYPHENBoolean flag for merging with hyphen for real-words.
  • false
CS_CAN_RW_SHORT_SPLIT_WORD_LENGTHThe length of short split word for real-word split.
  • 3
CS_CAN_RW_MAX_SHORT_SPLIT_WORD_NOThe maximum number of short split word for real-word.
  • 2
CS_CAN_RW_MERGE_CAND_MIN_WCThe minimum word count for real-word merge candidates.
  • 15
CS_CAN_RW_SPLIT_CAND_MIN_WCThe minimum word count for real-word split candidates.
  • 200
CS_CAN_RW_1TO1_CAND_MIN_WCThe minimum word count for real-word 1-to-1 candidates.
  • 1
CS_CAN_RW_1TO1_CAND_MIN_LENGTHThe minimum length of real-word 1-to-1 candidates.
  • 2
CS_CAN_RW_1TO1_CAND_MAX_KEY_SIZEThe maximum size of keys in HashMap for real-word 1-to-1 candidates in memory.
  • 1,000,000,000 (default)
  • Max. theoretic value: 2**31-1 = 2,147,483,647
  • Empirical value: < 1,500,000,000

Ranker Variables (12)
Variable NamesDescriptionsVariable Values (Default)
CS_RANKER_NW_S1_RANK_RANGE_FACThe range factor of the top orthographic score for qualifying stage-2 ranking for non-word split/1-to-1.
  • 0.08
CS_RANKER_NW_S1_MIN_OSCOREThe minimum orthographic score for 1 candidate in stage-2 ranking for non-word split/1-to-1.
  • 2.70
CS_RANKER_RW_MERGE_C_FACThe confidence factor of context score for real-word merge.
  • 060
CS_RANKER_RW_SPLIT_C_FACThe confidence factor of context score for real-word split.
  • 0.01
CS_RANKER_RW_1TO1_C_FACThe confidence factor of context score for real-word 1-to-1.
  • 0.00
CS_RANKER_RW_1TO1_CAND_MIN_CSThe minimum context score of the top candidate for real-word 1-to-1.
  • 0.00
CS_RANKER_RW_1TO1_CAND_CS_DISTThe minimum distance of context score between the top candidate and the original token for real-word 1-to-1.
  • 0.085
CS_RANKER_RW_1TO1_CAND_CS_FACThe factor of context score between the top candidate and the original token for real-word 1-to-1.
  • 0.10
CS_RANKER_RW_1TO1_WORD_MIN_CSThe minimum context score of the original token for real-word 1-to-1.
  • -0.085
CS_RANKER_RW_1TO1_CAND_MIN_FSThe minimum frequency score of the original token for real-word 1-to-1.
  • 0.0006
CS_RANKER_RW_1TO1_CAND_FS_DISTThe minimum distance of frequency score between the top candidate and the original token for real-word 1-to-1.
  • 0.02
CS_RANKER_RW_1TO1_CAND_FS_FACThe factor of frequency score between the top candidate and the original token for real-word 1-to-1.
  • 0.035

Score Variables (3)
Variable NamesDescriptionsVariable Values (Default)
CS_ORTHO_SCORE_ED_DIST_FACWeighting factor of edit distance for orthographic score.
  • 1.00
CS_ORTHO_SCORE_PHONETIC_FACWeighting factor of phonetic for orthographic score.
  • 0.70
CS_ORTHO_SCORE_OVERLAP_FACWeighting factor of overlap for orthographic score.
  • 0.80

Context Setup Variables (7)
Variable NamesDescriptionsVariable Values (Default)
CS_W2V_SKIP_WORDA Boolean flag of skipping context words if have no word2Vec score.
  • true
CS_NW_1TO1_CONTEXT_RADIUSContext radius for non-word 1-to-1.
  • 2
CS_NW_SPLIT_CONTEXT_RADIUSContext radius for non-word split.
  • 2
  • Not used (CSpell combined non-word split and 1-to-1 in one model)
CS_NW_MERGE_CONTEXT_RADIUSContext radius for non-word merge.
  • 2
CS_RW_1TO1_CONTEXT_RADIUSContext radius for real-word 1-to-1.
  • 2
CS_RW_SPLIT_CONTEXT_RADIUSContext radius for real-word split.
  • 2
CS_RW_MERGE_CONTEXT_RADIUSContext radius for real-word merge.
  • 2

II. Syntax

  • # -- comment lines begin with "#".
  • variable=value: set variable to value

III. File Location

  • default: ${CSPELL_DIR}/data/Config/cSpell.properties
  • may be specified by option -x:config_file_absolute_path

Notes: The CSpell installation program generates ${CSPELL_DIR}/data/config/cSpell.properties automatically (from ${CSPELL_DIR}/data/Config/cSpell.properties.TEMPLATE) according to options users chose during the installation.