Spelling Variant Patterns - MCES (Metaphone, Caverphone Edit Distance, Sorted Distance)
I. Introduction
This enhanced MCES model is based on Double Metaphone (with maxCodeLength = 60, Caverphone 2.0, Edit distance, and min. sorted distance). They are descibed below:
II. Algorithm
This algorithm must apply on a spVar grouped file (such as from SpVarNorm). For those terms are not identified in any spelling variant group, MCES algorithm is used to add more spVars to the existing group by checks following properties:
III. Algorithm Details
HashMap<String, HashSet<String>>
IV. Algorithm Examples
Test Case | Terms | Norm | Metaphone | Caverphone | GrecoLatin | Edit Distance | Sorted Distance | SpVar EUI |
---|---|---|---|---|---|---|---|---|
Case 1: Same Metaphone and Caverphone, not GL plural, Edit distance = 1 | ||||||||
1.1 | anemia anaemia | anemia anaemia | ANM ANM | ANMA111111 ANMA111111 | false | 1 | E0008920 | |
1.2 | anemic anaemic | anemic anaemic | ANMK ANMK | ANMK111111 ANMK111111 | false | 1 | E0528325 | |
1.3 | abortigenic abortogenic | abortigenic abortogenic | APRTJNK APRTJNK | APTKNK1111 APTKNK1111 | false | 1 | E0583447 | |
1.4 | lamictal lamiktal | lamictal lamiktal | LMKTL LMKTL | LMKTA11111 LMKTA11111 | false | 1 | E0413046 | |
1.5 | aestheticise aestheticize | aestheticise aestheticize | AS0TSS AS0TSS | ASTTSS1111 ASTTSS1111 | false | 1 | E0547192 | |
Case 2: Same Metaphone and Caverphone, not GL plural, Edit distance = 2 | ||||||||
2.1 | yuppie yuppy | yuppie yuppy | AP AP | YPA1111111 YPA1111111 | false | 2 | E0520693 | |
2.2 | yuppie flu yuppy flu | yuppieflu yuppyflu | APFL APFL | YPFLA11111 YPFLA11111 | false | 2 | E0520692 | |
2.3 | lamellose lamellous | lamellose lamellous | LMLS LMLS | LMLS111111 LMLS111111 | false | 2 | E0587907 | |
2.4 | zoril zorilla | zoril zorilla | SRL SRL | SRA1111111 SRLA111111 | false | 2 | E0341649 | |
2.5 | zorilla zorille | zorilla zorille | SRL SRL | SRLA111111 SRA1111111 | false | 1 | E0341649 | |
2.6 | zorille zorillo | zorille zorillo | SRL SRL | SRLA111111 SRA1111111 | false | 1 | E0341649 | |
2.7 | zorillo zoril | zorillo zoril | SRL SRL | YPA1111111 SRA1111111 | false | 2 | E0341649 | |
Case 3: Same Metaphone and Caverphone, Not GL plural, Edit distance = 3 | ||||||||
3.1 | Adson's maneuver Adson's manoeuvre | adsonmaneuver adsonmanoeuvre | ATSNSMNFR ATSNSMNFR | ATSNSMNFA1 ATSNSMNFA1 | false | 3 | E0213214 | |
3.2 | amylcinnamal amyl cinnamoyl | amylcinnamal amylcinnamoyl | AMLSNML AMLSNML | AMSNMA1111 AMSNMA1111 | false | 3 | E0557025 | |
3.3 | directress directrice | directress directrice | TRKTRS TRKTRS | TRKTRS1111 TRKTRK1111 | false | 3 | E0207379 | |
3.4 | tizoprolic tizoprolique | tizoprolic tizoprolique | TSPRLK TSPRLK | TSPRLK1111 TSPRLKA111 | false | 3 | E0566262 | |
3.5 | type 3 deiodinase type III deiodinase | typethreedeiodinase typeiiideiodinase | TPTTNS TPTTNS | TPTTNS1111 TPTTNS1111 | false | 3 | E0681935 | |
Case 4: Same Metaphone and Caverphone, Not GL plural, Edit distance = 4 | ||||||||
4.1 | Telugu Teloogoo | telugu teloogoo | TLK TLK | TLKA111111 TLKA111111 | false | 4 | E0205161 | |
4.2 | bromofenofos bromophenophos | bromofenofos bromophenophos | PRMFNFS PRMFNFS | PRMFNFS111 PRMFNFS111 | false | 4 | E0303924 | |
4.3 | comradery camaraderie | comradery camaraderie | KMRTR KMRTR | KMRTRA1111 KMRTRA1111 | false | 4 | E0333034 | |
4.4 | litchi nut lychee nut | litchinut lycheenut | LXNT LXNT | LKNT111111 LKNT111111 | false | 4 | E0456918 | |
4.5 | fosfomycin phosphomycin | fosfomycin phosphomycin | FSFMSN FSFMSN | FSFMSN1111 FSFMSN1111 | false | 4 | E0028649 | |
Case 5: Same Metaphone, different Caverphone, not GL plural, Edit distance = 2 | ||||||||
5.1 | aesthetical aesthetically | aesthetical aesthetically | AS0TKL AS0TKL | ASTTKA1111 ASTTKLA111 | false | 2 | false | |
5.2 | zymographical zymographically | zymographical zymographically | SMKRFKL SMKRFKL | SMKRFKA111 SMKRFKLA11 | false | 2 | false | |
Case 6: Same Metaphone, Caverphone, Edit distance = 2, GrecoLatin Plural | ||||||||
6.1 | acroscleroses acrosclerosis | acroscleroses acrosclerosis | AKRSKRSS AKRSKRSS | AKRSKLRSS1 AKRSKLRSS1 | true | 1 | false | |
6.2 | zygomycoses zygomycosis | zygomycoses zygomycosis | SKMKSS SKMKSS | SKMKSS1111 SKMKSS1111 | true | 1 | false | |
6.3 | ammon's horn scleroses ammon's horn sclerosis | ammonhornscleroses ammonhornsclerosis | AMNSRNSKRSS AMNSRNSKRSS | AMNSNSKLRS AMNSNSKLRS | true | 1 | false | |
6.4 | fimbria fimbriae | fimbria fimbriae | FMPR FMPR | FMPRA11111 FMPRA11111 | true | 1 | false | |
6.5 | infraorbital foramen infraorbital foramina | infraorbitalforamen infraorbitalforamina | ANFRRPTLFRMN ANFRRPTLFRMN | ANFRPTFRMN ANFRPTFRMN | true | 2 | false | |
6.6 | bacterial culture media bacterial culture medium | bacterialculturemedia bacterialculturemedium | PKTRLKLTRMT PKTRLKLTRMTM | PKTRKTRMTA PKTRKTRMTM | true | 2 | false | |
Case 7: Same Metaphone, Caverphone, Edit distance < 2, not GrecoLatin Plural (false positive) | ||||||||
7.1 | zixoryn zixorin | zixoryn zixorin | SKSRN SKSRN | SKRN111111 SKRN111111 | false | 1 | false | |
7.2 | zygomycetes zygomycetous | zygomycetes zygomycetous | SKMSTS SKMSTS | SKMSTS1111 SKMSTS1111 | false | 2 | false |