strdist(X;Y;M;N)
Returns the edit distance between two given strings using a specified method.
Syntax
strdist(X;Y;M;N)
bstrdist(X;Y;M;N)
Input
Argument | Type | Description |
---|---|---|
X |
text | The first of the two strings in which to determine the edit distance A scalar value or the name of a column |
Y |
text | The second of the two strings in which to determine the edit distance A scalar value or the name of a column |
M |
text | Specifies the method used to calculate the edit distance between the two given
strings
|
N |
integer | Specifies the type of measure
|
Return Value
Returns the decimal number corresponding to a measure of the edit distance between
X
and Y
(i.e., the minimum number of edits required to
transform one string into the other).
An "edit" is defined as an insertion or deletion for insert-delete distance; as either of those or a single-character substitution for Levenshtein distance; and as any of those or a transposition of adjacent characters for Damerau-Levenshtein distance.
If N
=0 then the result is the minimum number of
edits required.
N
=1 then the result is normalized:- If
M
='lev' or 'dl', it is divided by the length of the longer ofX
andY
. - If
M
='id', it is divided by the sum of the length ofX
andY
.
If N
=-1 and
M
='id', the result is the length of the longest
(not necessarily contiguous) subsequence common to the two strings, which is related to the
insert-delete edit distance as follows:
edit_distance_between_X_and_Y = length_of_X + length_of_Y - 2*length_of_longest_common_subsequence
X
or Y
is N/A:- If
N
=0, the result is the length of the non-N/A argument. - If
N
=1, the result is 1.
Sample Usage
string1 |
string2 |
method |
normalized |
strdist(string1;string2;method;normalized) |
---|---|---|---|---|
'cat' | 'rat' | 'lev' | 0 | 1 |
'cat' | 'rat' | 'lev' | 1 | 0.333333333333333 |
'apples' | 'oranges' | 'lev' | 0 | 5 |
'apples' | 'oranges' | 'lev' | 1 | 0.714285714285714 |
'apples' | 'apples' | 'lev' | 0 | 0 |
'apples' | 'apples' | 'lev' | 1 | 0 |
'formal' | 'fromage' | 'lev' | 0 | 4 |
'formal' | 'fromage' | 'dl' | 0 | 3 |
'formal' | 'fromage' | 'dl' | 1 | 0.4285714285714285 |
'formal' | 'fromage' | 'id' | 0 | 5 |
'formal' | 'fromage' | 'id' | 1 | 0.3846153846153846 |
'formal' | 'fromage' | 'id' | -1 | 4 |
'' | 'fromage' | 'id' | 0 | 7 |
'' | 'fromage' | 'id' | 1 | 1 |
Additional Information
strdist
is Unicode (UTF-8) compliant and will work with Unicode or plain ASCII text fields.- If passed a string argument that is not legal Unicode, it will by default signal an error (configurable as a user preference).
- A corresponding function
bstrdist
can be used with non-Unicode strings (e.g., binary or legacy encodings).