strdist(X;Y;M;N)
Returns the edit distance between two given strings using a specified method.
Syntax
strdist(X;Y;M;N)
bstrdist(X;Y;M;N)
Input
| Argument | Type | Description |
|---|---|---|
X |
text | The first of the two strings in which to determine the edit distance A scalar value or the name of a column |
Y |
text | The second of the two strings in which to determine the edit distance A scalar value or the name of a column |
M |
text | Specifies the method used to calculate the edit distance between the two given
strings
|
N |
integer | Specifies the type of measure
|
Return Value
Returns the decimal number corresponding to a measure of the edit distance between
X and Y (i.e., the minimum number of edits required to
transform one string into the other).
An "edit" is defined as an insertion or deletion for insert-delete distance; as either of those or a single-character substitution for Levenshtein distance; and as any of those or a transposition of adjacent characters for Damerau-Levenshtein distance.
If N=0 then the result is the minimum number of
edits required.
N=1 then the result is normalized:- If
M='lev' or 'dl', it is divided by the length of the longer ofXandY. - If
M='id', it is divided by the sum of the length ofXandY.
If N=-1 and
M='id', the result is the length of the longest
(not necessarily contiguous) subsequence common to the two strings, which is related to the
insert-delete edit distance as follows:
edit_distance_between_X_and_Y = length_of_X + length_of_Y - 2*length_of_longest_common_subsequence
X or Y is N/A:- If
N=0, the result is the length of the non-N/A argument. - If
N=1, the result is 1.
Sample Usage
string1 |
string2 |
method |
normalized |
strdist(string1;string2;method;normalized) |
|---|---|---|---|---|
| 'cat' | 'rat' | 'lev' | 0 | 1 |
| 'cat' | 'rat' | 'lev' | 1 | 0.333333333333333 |
| 'apples' | 'oranges' | 'lev' | 0 | 5 |
| 'apples' | 'oranges' | 'lev' | 1 | 0.714285714285714 |
| 'apples' | 'apples' | 'lev' | 0 | 0 |
| 'apples' | 'apples' | 'lev' | 1 | 0 |
| 'formal' | 'fromage' | 'lev' | 0 | 4 |
| 'formal' | 'fromage' | 'dl' | 0 | 3 |
| 'formal' | 'fromage' | 'dl' | 1 | 0.4285714285714285 |
| 'formal' | 'fromage' | 'id' | 0 | 5 |
| 'formal' | 'fromage' | 'id' | 1 | 0.3846153846153846 |
| 'formal' | 'fromage' | 'id' | -1 | 4 |
| '' | 'fromage' | 'id' | 0 | 7 |
| '' | 'fromage' | 'id' | 1 | 1 |
Additional Information
strdistis Unicode (UTF-8) compliant and will work with Unicode or plain ASCII text fields.- If passed a string argument that is not legal Unicode, it will by default signal an error (configurable as a user preference).
- A corresponding function
bstrdistcan be used with non-Unicode strings (e.g., binary or legacy encodings).
