transcode(X;F;T;S;O)

Converts a string from one encoding into a different encoding. (Available as of version 12.24)

Syntax

transcode(X;F;T;S;O)

Input

Argument Type Description
X text The string on which to apply the function. This argument is required.

A scalar value or the name of a column

F text F is the current encoding of X. If this argument is omitted, the default encoding is UTF-8.

A scalar value or the name of a column

T text T is the target encoding of X. If this argument is omitted, the default encoding is UTF-8.

A scalar value or the name of a column

S text S is the string which is returned in place of a result if transcode cannot translate the string X
The following special characters can be used:
  • % The system error is replaced by the error message returned by the conversion (in ASCII/UTF-8).
  • $ The system error is replaced by the original input string in its original encoding.
If S is omitted and transcode cannot translate the string X, the function displays a system error.

A scalar value or the name of a column

O text A list of one or more special options
  • 'discard' allows transcode to discard illegal sequences of bytes in the input string without causing an error or inserting a placeholder character. Conversion then proceeds from the next valid sequence, where possible.
  • 'translit' allows transcode to attempt to substitute similar characters from the target T encoding.
  • 'cstring' causes transcode to represent the output as a "ragged" string, regardless of how the input is represented.
  • 'symbol' causes transcode to display the output as a symbol, regardless of how the input is represented.
  • 'encode64' further encodes the output string of transcode as Base-64, an ASCII-based encoding for binary data that is able to represent strings with zero bytes in them safely.
  • 'decode64' causes transcode to decode a Base-64 input string before performing the encoding conversion.

Return Value

Returns a text value corresponding to the string X translated from encoding F into encoding T.

transcode throws an error if the string is not actually in the encoding F, contains characters illegal in encoding F, or contains characters that cannot be expressed in encoding T.

Sample Usage

In the following function call:

transcode(X;'UTF-8';'ISO-8859-1';'$ (%)';)

transcode converts UTF-8 text X into Latin-1, but leaves any strings that aren't valid UTF-8 or cannot be expressed in Latin-1 unchanged, with an error message added in parentheses.

In the following function call:

transcode(X;'UTF-8';'ISO-8859-1';;'discard''translit')

transcode tries to convert UTF-8 to Latin-1 as tolerantly as possible, discarding illegal sequences in the UTF-8 input and transliterating characters that can't be represented in Latin-1.

You can have F (current encoding) and T (target encoding) be the same value if you use the 'discard' option. The following function call discards illegal non-UTF-8 sequences from a string:
transcode(string;;;;'discard')

Additional Information

  • The following are valid encoding types:
    • UTF8
    • UTF-8
    • ASCII
    • US
    • US-ASCII
    • LATIN1
    • EBCDIC
    • ISO_8859_1
    • ISO-8859-2
    • ISO8859-2
    • ISO_8859-2
    • ISO_8859_2
    • ISO-8859-3
    • ISO8859-3
    • ISO_8859-3
    • ISO_8859_3
    • ISO-8859-4
    • ISO8859-4
    • ISO_8859-4
    • ISO_8859_4
    • ISO-8859-5
    • ISO8859-5
    • ISO_8859-5
    • ISO_8859_5
    • ISO-8859-6
    • ISO8859-6
    • ISO_8859-6
    • ISO_8859_6
    • ISO-8859-7
    • ISO8859-7
    • ISO_8859-7
    • ISO_8859_7
    • ISO-8859-8
    • ISO8859-8
    • ISO_8859-8
    • ISO_8859_8
    • ISO-8859-9
    • ISO8859-9
    • ISO_8859-9
    • ISO_8859_9
    • ISO-8859-13
    • ISO8859-13
    • ISO_8859-13
    • ISO_8859_13
    • ISO-8859-15
    • ISO8859-15
    • ISO_8859-15
    • ISO_8859_15
    • CP500
    • CP65001
    • CP1200
    • UTF16LE
    • UTF-16LE
    • UCS2LE
    • UCS-2LE
    • UCS-2-INTERNAL
    • CP1201
    • UTF16BE
    • UTF-16BE
    • UCS2BE
    • UCS-2BE
    • UNICODEFFFE
    • CP12000
    • UTF32LE
    • UTF-32LE
    • UCS4LE
    • UCS-4LE
    • CP12001
    • UTF32BE
    • UTF-32BE
    • UCS4BE
    • UCS-4BE
    • UTF16
    • UTF-16
    • UCS2
    • UCS-2
    • UTF32
    • UTF-32
    • UCS-4
    • UCS4
    • ANSI_X3.4-1968
    • ANSI_X3.4-1986
    • CP367
    • IBM367
    • ISO-IR-6
    • ISO646-US
    • ISO_646.IRV:1991
    • CSASCII
    • CP819
    • IBM819
    • ISO-8859-1
    • ISO-IR-100
    • ISO8859-1
    • ISO_8859-1
    • ISO_8859-1:1987
    • L1
    • CSISOLATIN1
    • CP1250
    • MS-EE
    • WINDOWS-1250
    • CP1251
    • MS-CYRL
    • WINDOWS-1251
    • CP1252
    • MS-ANSI
    • WINDOWS-1252
    • CP1253
    • MS-GREEK
    • WINDOWS-1253
    • CP1254
    • MS-TURK
    • WINDOWS-1254
    • CP1255
    • MS-HEBR
    • WINDOWS-1255
    • CP1256
    • MS-ARAB
    • WINDOWS-1256
    • CP1257
    • WINBALTRIM
    • WINDOWS-1257
    • CP1258 WINDOWS-1258
    • 850
    • CP850
    • IBM850
    • CSPC850MULTILINGUAL
    • 862 CP862 IBM862
    • CSPC862LATINHEBREW
    • 866
    • CP866
    • IBM866
    • CSIBM866
    • CP154
    • CYRILLIC-ASIAN
    • PT154
    • PTCP154
    • CSPTCP154
    • CP1133
    • IBM-CP1133
    • CP874
    • WINDOWS-874
    • CP51932
    • MS51932
    • WINDOWS-51932
    • EUC-JP
    • CP932
    • MS932
    • SHIFFT_JIS
    • SHIFFT_JIS-MS
    • SJIS SJIS-MS
    • SJIS-OPEN
    • SJIS-WIN
    • WINDOWS-31J
    • WINDOWS-932
    • CSWINDOW
    • S31J
    • CP50221
    • ISO-2022-JP
    • ISO-2022-JP-MS
    • ISO2022-JP
    • ISO2022-JP-MS
    • MS50221
    • WINDOWS-50221
    • CP936
    • GBK
    • MS936
    • WINDOWS-936
    • CP950
    • BIG5
    • BIG5HKSCS
    • BIG5-HKSCS
    • CP949
    • UHC
    • EUC-KR
    • CP1361
    • JOHAB
    • 437
    • CP437
    • IBM437
    • CSPC8CODEPAGE437
    • CP737
    • CP775
    • IBM775
    • CSPC775BALTIC
    • 852
    • CP852
    • IBM852
    • CSPCP852
    • CP853
    • 855
    • CP855
    • IBM855
    • CSIBM855
    • 857
    • CP857
    • IBM857
    • CSIBM857
    • CP858
    • 860
    • CP860
    • IBM860
    • CSIBM860
    • 861
    • CP-IS
    • CP861
    • IBM861
    • CSIBM861
    • 863
    • CP863
    • IBM863
    • CSIBM863
    • CP864
    • IBM864
    • CSIBM864
    • 865
    • CP865
    • IBM865
    • CSIBM865
    • 869
    • CP-GR
    • CP869 IBM869
    • CSIBM869
    • CP1125 IBM037
    • IBM500
    • ASMO-708
    • DOS-720
    • IBM737
    • IBM00858
    • DOS-862
    • IBM870
    • CP875
    • SHIFT_JIS
    • SHIFT-JIS
    • GB2312
    • KS_C_5601-1987
    • IBM1026
    • IBM01047
    • IBM01140
    • IBM01141
    • IBM01142
    • IBM01143
    • IBM01144
    • IBM01145
    • IBM01146
    • IBM01147
    • IBM01148
    • IBM01149
    • MACINTOSH
    • X-MAC-JAPANESE
    • X-MAC-CHINESETRAD
    • X-MAC-KOREAN
    • X-MAC-ARABIC
    • X-MAC-HEBREW
    • X-MAC-GREEK
    • X-MAC-CYRILLIC
    • X-MAC-CHINESESIMP
    • X-MAC-ROMANIAN
    • X-MAC-UKRAINIAN
    • X-MAC-THAI
    • X-MAC-CE
    • X-MAC-ICELANDIC
    • X-MAC-TURKISH
    • X-MAC-CROATIAN
    • X-CHINESE_CNS
    • X-CP20001
    • X_CHINESE-ETEN
    • X-CP20003
    • X-CP20004
    • X-CP20005
    • X-IA5
    • X-IA5-GERMAN
    • X-IA5-SWEDISH
    • X-IA5-NORWEGIAN
    • X-CP20261
    • X-CP20269
    • IBM273
    • IBM277
    • IBM278
    • IBM280
    • IBM284
    • IBM285
    • IBM290
    • IBM297
    • IBM420
    • IBM423
    • IBM424
    • X-EBCDIC-KOREANEXTENDED
    • IBM-THAI
    • KOI8-R
    • IBM871
    • IBM880
    • IBM905
    • IBM00924
    • X-CP20936
    • X-CP20949
    • CP1025
    • KOI8-U X-EUROPA
    • ISO-8859-8-I
    • ISO8859-8-I
    • ISO_8859-8-I
    • ISO_8859_8-I
    • CSISO2022JP
    • ISO-2022-KR
    • ISO2022-KR
    • X-CP50227
    • EUC-CN
    • HZ-GB-2312
    • GB18030
    • X-ISCII-DE
    • X-ISCII-BE
    • X-ISCII-TA
    • X-ISCII-TE
    • X-ISCII-AS
    • X-ISCII-OR
    • X-ISCII-KA
    • X-ISCII-MA
    • X-ISCII-GU
    • X-ISCII-PA
Note: This list of valid encodings is subject to change. Not all strings can be converted between two encodings.