## how alike are 2 strings?

April 13th, 2011

Showing and ordering how alike strings are is something I have a use for nearly every week, from “name rationalisation” jobs in address books, through admin helper utilities to “did you mean?” on web sites, but I did not realise the best way has a proper name, its called Levensnshtein distance and is defined as:

“The Levenshtein distance between two strings is defined as the minimum number of edits needed to transform one string into the other “

It is already built into apache commons String utils as “getLevenshteinDistance”

but here is a near complete list of implemlimentions for different languages, I include the Java one from that page below (in case the source page goes down)

public class LevenshteinDistance {
private static int minimum(int a, int b, int c) {
return Math.min(Math.min(a, b), c);
}
public static int computeLevenshteinDistance(CharSequence str1,
CharSequence str2) {
int[][] distance = new int[str1.length() + 1][str2.length() + 1];
for (int i = 0; i <= str1.length(); i++)
distance[i][0] = i;
for (int j = 0; j <= str2.length(); j++)
distance[0][j] = j;
for (int i = 1; i <= str1.length(); i++)
for (int j = 1; j <= str2.length(); j++)
distance[i][j] = minimum(
distance[i - 1][j] + 1,
distance[i][j - 1] + 1,
distance[i - 1][j - 1]
+ ((str1.charAt(i - 1) == str2.charAt(j - 1)) ? 0
: 1));
return distance[str1.length()][str2.length()];
}
}

Yes its sad, so just go away and leave me to my string comparisons ;-)