This function returns the Levenshtein-Distance between
the two argument strings or -1, if one of the argument
strings is longer than the limit of 255 characters (255
should be more than enough for name or dictionary comparison,
and nobody serious would be doing genetic analysis with
PHP).
The Levenshtein distance is defined as the minimal
number of characters you have to replace, insert or delete to
transform str1 into str2. The complexity of the
algorithm is O(m*n), where n and m are the
length of str1 and str2 (rather good when compared
to
similar_text(), which is O(max(n,m)**3), but still
expensive).
In its simplest form the function will take only the two
strings as parameter and will calculate just the number of
insert, replace and delete operations needed to transform str1 into str2.
A second variant will take three additional parameters
that define the cost of insert, replace and delete
operations. This is more general and adaptive than variant
one, but not as efficient.
The third variant (which is not implemented yet) will be
the most general and adaptive, but also the slowest
alternative. It will call a user-supplied function that will
determine the cost for every possible operation.
The user-supplied function will be called with the
following arguments:
operation to apply: 'I', 'R' or 'D'
The user-supplied function approach offers the
possibility to take into account the relevance of and/or
difference between certain symbols (characters) or even the
context those symbols appear in to determine the cost of
insert, replace and delete operations, but at the cost of
losing all optimizations done regarding cpu register
utilization and cache misses that have been worked into the
other two variants.
See also soundex(),
similar_text(), and
metaphone().