Fuzziness (Edit/Levenshtein Distance) is a matching technique that allows for a variation in spelling or small variations in the spelling of a search term and the entities returned in the search results.
The fuzziness will allow 1 phonetic typo per each word from the search term, the fuzziness percentage has more to do with the length of the word. Setting the interval is entirely dependent on your risk-based approach and how sure you are that the names you input for searching are correct (e.g. if you take the info directly from the customers' IDs, or if they input it themselves - which would be more prone to error).
For example Leederheimer - Lexderheimer are far more likely to be misspellings of each other than Lee - Lex
Fuzziness - Full Word Matching
Fuzziness (Edit/Levenshtin Distance) is a name-matching technique used to reduce the impact of misspellings / small variations in name spelling.
To reduce false positives we have capped the maximum edit distance change at one character.
Exact Match will disable all pre-processing, algorithmic levers and custom configurations (e.g. equivalent names, phonetic matching) apart from word order and AKA matching, and will add a length filter (i.e. "John Smith" won't match "John Williams Smith"). It will also disable YOB fuzziness.
Fuzziness Setting | Minimum word length to allow fuzziness |
0% |
None (no fuzziness allowed) |
10% | 25 |
20% | 13 |
30% | 9 |
40% | 7 |
50% | 5 |
60% | 5 |
70% | 4 |
80% | 4 |
90% | 3 |
100% | 3 |
Difference between 0% fuzziness and exact match
- The exact match does not allow for extra words to be added i.e. Robert Mugabe will not match with Robert Gabriel Mugabe
- We allow +/- 1 year difference in Year of Birth when fuzziness is between 10% and 100%. For exact match and 0% fuzziness, the Year of Birth has to match exactly
- An exact match doesn’t account for any pre-processing, for example, we do not strip out honorifics or suffixes like Mr./Ms./Dr./PHD etc
Why is fuzziness useful?
It allows for variations in the spellings of the search term. If you misspell or are unsure of the spelling of a search term, you will be returned entities that are spelt differently to the search term by an inserted, omitted, or replaced character.
This principle is useful when searching for non-Latin Characters. Fuzziness will not be performed on non-Latin characters, however, the search term will be converted from the native non-Latin text into Latin. The Latin transliteration is what we conduct fuzziness on. Through transliteration, there may be variation in the spelling of the search term to what it was in the non-Latin text, therefore having a higher fuzziness setting for non-Latin names is useful to prevent false negatives from occurring.
Impact on false positives
The ComplyAdvantage Edit/Levenshtein distance algorithm has been tested extensively (both internally and by independent third-party consultants) across different names and name variations in our database. To reduce false positives we have capped the maximum edit distance change at one character. This allows for spelling errors/variations without returning large numbers of unnecessary false positives.