transcode(X;F;T;S;O)
Converts a string from one encoding into a different encoding. (Available as of version 12.24)
Syntax
transcode(X;F;T;S;O)
Input
Argument | Type | Description |
---|---|---|
X |
text | The string on which to apply the function. This argument is required. A scalar value or the name of a column |
F |
text | F is the current encoding of X . If this argument is
omitted, the default encoding is UTF-8.A scalar value or the name of a column |
T |
text | T is the target encoding of X . If this argument is
omitted, the default encoding is UTF-8.A scalar value or the name of a column |
S |
text | S is the string which is returned in place of a result if
transcode cannot translate the string
X
The following special characters can be used: If
S is omitted and
transcode cannot translate the string
X , the function displays a system error.A scalar value or the name of a column |
O | text | A list of one or more special options
|
Return Value
Returns a text value corresponding to the string X
translated from encoding
F
into encoding T
.
transcode
throws an error if the string is not actually in the
encoding F
, contains characters illegal in encoding
F
, or contains characters that cannot be expressed in encoding
T
.
Sample Usage
In the following function call:
transcode(X;'UTF-8';'ISO-8859-1';'$ (%)';)
transcode
converts UTF-8 text X
into
Latin-1, but leaves any strings that aren't valid UTF-8 or cannot be expressed in
Latin-1 unchanged, with an error message added in parentheses.
In the following function call:
transcode(X;'UTF-8';'ISO-8859-1';;'discard''translit')
transcode
tries to convert UTF-8 to Latin-1 as tolerantly as
possible, discarding illegal sequences in the UTF-8 input and transliterating
characters that can't be represented in Latin-1.
F
(current encoding) and T
(target
encoding) be the same value if you use the 'discard'
option. The
following function call discards illegal non-UTF-8 sequences from a
string:transcode(string;;;;'discard')
Additional Information
- The following are valid encoding types:
- UTF8
- UTF-8
- ASCII
- US
- US-ASCII
- LATIN1
- EBCDIC
- ISO_8859_1
- ISO-8859-2
- ISO8859-2
- ISO_8859-2
- ISO_8859_2
- ISO-8859-3
- ISO8859-3
- ISO_8859-3
- ISO_8859_3
- ISO-8859-4
- ISO8859-4
- ISO_8859-4
- ISO_8859_4
- ISO-8859-5
- ISO8859-5
- ISO_8859-5
- ISO_8859_5
- ISO-8859-6
- ISO8859-6
- ISO_8859-6
- ISO_8859_6
- ISO-8859-7
- ISO8859-7
- ISO_8859-7
- ISO_8859_7
- ISO-8859-8
- ISO8859-8
- ISO_8859-8
- ISO_8859_8
- ISO-8859-9
- ISO8859-9
- ISO_8859-9
- ISO_8859_9
- ISO-8859-13
- ISO8859-13
- ISO_8859-13
- ISO_8859_13
- ISO-8859-15
- ISO8859-15
- ISO_8859-15
- ISO_8859_15
- CP500
- CP65001
- CP1200
- UTF16LE
- UTF-16LE
- UCS2LE
- UCS-2LE
- UCS-2-INTERNAL
- CP1201
- UTF16BE
- UTF-16BE
- UCS2BE
- UCS-2BE
- UNICODEFFFE
- CP12000
- UTF32LE
- UTF-32LE
- UCS4LE
- UCS-4LE
- CP12001
- UTF32BE
- UTF-32BE
- UCS4BE
- UCS-4BE
- UTF16
- UTF-16
- UCS2
- UCS-2
- UTF32
- UTF-32
- UCS-4
- UCS4
- ANSI_X3.4-1968
- ANSI_X3.4-1986
- CP367
- IBM367
- ISO-IR-6
- ISO646-US
- ISO_646.IRV:1991
- CSASCII
- CP819
- IBM819
- ISO-8859-1
- ISO-IR-100
- ISO8859-1
- ISO_8859-1
- ISO_8859-1:1987
- L1
- CSISOLATIN1
- CP1250
- MS-EE
- WINDOWS-1250
- CP1251
- MS-CYRL
- WINDOWS-1251
- CP1252
- MS-ANSI
- WINDOWS-1252
- CP1253
- MS-GREEK
- WINDOWS-1253
- CP1254
- MS-TURK
- WINDOWS-1254
- CP1255
- MS-HEBR
- WINDOWS-1255
- CP1256
- MS-ARAB
- WINDOWS-1256
- CP1257
- WINBALTRIM
- WINDOWS-1257
- CP1258 WINDOWS-1258
- 850
- CP850
- IBM850
- CSPC850MULTILINGUAL
- 862 CP862 IBM862
- CSPC862LATINHEBREW
- 866
- CP866
- IBM866
- CSIBM866
- CP154
- CYRILLIC-ASIAN
- PT154
- PTCP154
- CSPTCP154
- CP1133
- IBM-CP1133
- CP874
- WINDOWS-874
- CP51932
- MS51932
- WINDOWS-51932
- EUC-JP
- CP932
- MS932
- SHIFFT_JIS
- SHIFFT_JIS-MS
- SJIS SJIS-MS
- SJIS-OPEN
- SJIS-WIN
- WINDOWS-31J
- WINDOWS-932
- CSWINDOW
- S31J
- CP50221
- ISO-2022-JP
- ISO-2022-JP-MS
- ISO2022-JP
- ISO2022-JP-MS
- MS50221
- WINDOWS-50221
- CP936
- GBK
- MS936
- WINDOWS-936
- CP950
- BIG5
- BIG5HKSCS
- BIG5-HKSCS
- CP949
- UHC
- EUC-KR
- CP1361
- JOHAB
- 437
- CP437
- IBM437
- CSPC8CODEPAGE437
- CP737
- CP775
- IBM775
- CSPC775BALTIC
- 852
- CP852
- IBM852
- CSPCP852
- CP853
- 855
- CP855
- IBM855
- CSIBM855
- 857
- CP857
- IBM857
- CSIBM857
- CP858
- 860
- CP860
- IBM860
- CSIBM860
- 861
- CP-IS
- CP861
- IBM861
- CSIBM861
- 863
- CP863
- IBM863
- CSIBM863
- CP864
- IBM864
- CSIBM864
- 865
- CP865
- IBM865
- CSIBM865
- 869
- CP-GR
- CP869 IBM869
- CSIBM869
- CP1125 IBM037
- IBM500
- ASMO-708
- DOS-720
- IBM737
- IBM00858
- DOS-862
- IBM870
- CP875
- SHIFT_JIS
- SHIFT-JIS
- GB2312
- KS_C_5601-1987
- IBM1026
- IBM01047
- IBM01140
- IBM01141
- IBM01142
- IBM01143
- IBM01144
- IBM01145
- IBM01146
- IBM01147
- IBM01148
- IBM01149
- MACINTOSH
- X-MAC-JAPANESE
- X-MAC-CHINESETRAD
- X-MAC-KOREAN
- X-MAC-ARABIC
- X-MAC-HEBREW
- X-MAC-GREEK
- X-MAC-CYRILLIC
- X-MAC-CHINESESIMP
- X-MAC-ROMANIAN
- X-MAC-UKRAINIAN
- X-MAC-THAI
- X-MAC-CE
- X-MAC-ICELANDIC
- X-MAC-TURKISH
- X-MAC-CROATIAN
- X-CHINESE_CNS
- X-CP20001
- X_CHINESE-ETEN
- X-CP20003
- X-CP20004
- X-CP20005
- X-IA5
- X-IA5-GERMAN
- X-IA5-SWEDISH
- X-IA5-NORWEGIAN
- X-CP20261
- X-CP20269
- IBM273
- IBM277
- IBM278
- IBM280
- IBM284
- IBM285
- IBM290
- IBM297
- IBM420
- IBM423
- IBM424
- X-EBCDIC-KOREANEXTENDED
- IBM-THAI
- KOI8-R
- IBM871
- IBM880
- IBM905
- IBM00924
- X-CP20936
- X-CP20949
- CP1025
- KOI8-U X-EUROPA
- ISO-8859-8-I
- ISO8859-8-I
- ISO_8859-8-I
- ISO_8859_8-I
- CSISO2022JP
- ISO-2022-KR
- ISO2022-KR
- X-CP50227
- EUC-CN
- HZ-GB-2312
- GB18030
- X-ISCII-DE
- X-ISCII-BE
- X-ISCII-TA
- X-ISCII-TE
- X-ISCII-AS
- X-ISCII-OR
- X-ISCII-KA
- X-ISCII-MA
- X-ISCII-GU
- X-ISCII-PA