addtab
(Load a large table)
The addtab
transaction loads a table into the 1010data
system.
Table loads can be streaming or file-based. File-based loads require the user to have a
1010data FTP account. The source data files must be fully uploaded to the user’s FTP
account, AWS S3 bucket, or Azure blob storage before executing the addtab
transaction. Additionally, for source files in AWS S3 or Azure blob, the user account must
be configured with the appropriate keys to access the bucket. See Storage account
management for instructions on how to add keys to your user account.
As input, addtab
accepts an XML version of the 1010 PowerLoader V1 spec
file. If you still wish to use the legacy format, use the convert
transaction to convert your PowerLoader V1 spec file into an XML spec.
This transaction can be invoked synchronously or asynchronously with the
<sync>
tag. In sync mode, the transaction returns when the table is
finished loading. In async mode, when the transaction returns a response, it does not mean
the table is created, it means table creation has successfully been initiated. With async
mode, you use the status
transaction to check on the status of the table
creation. Streaming table loads must be invoked synchronously. See the SYNC load
example and ASYNC load example sections at the end of this topic.
addtab
and
addtable
are interchangeable (addtable
simply calls
addtab
).Query string
The query string for addtab
contains the following parameters:
api=addtab
- The name of the API transaction:
addtab
. commit
- A boolean field.
commit=1
should be submitted as the final transaction for synchronous streaming loads to signal all the data has been submitted and the table can now be finalized. path
- The full path (
directory1.directory2...directoryn.table
) of the new table to be created. Required for streaming loads. apiversion
- The version of the API that should handle the requested transaction.
For the most up-to-date version, use apiversion=3.
uid
- A valid 1010data user name.
pswd
- A valid password for the 1010data user name provided to the
uid
parameter.Note: Thepswd
value should be the encrypted password returned from thelogin
transaction. sid
- The session ID for the current API session.
This value is returned by the
login
transaction and must be provided to every transaction (exceptlogin
). kill
(optional)- Valid values for
kill
are as follows:yes
- Terminate the existing session and start a new one.
no
- Do not terminate the existing session.
If no session exists, a new one will be created. If a session does exist, an error is returned.
Note: Ifkill=no
and a session is active, the system returns an error. possess
- Log in and possess the existing session.
auth
- Authenticate with the system but do not kill, possess, or start any session.
The default is
yes
.
Query String Example
https://www2.1010data.com/gw.k?api=addtab&apiversion=3&uid=$UID&pswd=$PSWD&sid=$SID
XML Input to Server
The specifications contain the following elements:
XML Element To Send | Description of Sent Element Content | Required/ Optional | Example |
---|---|---|---|
<sync> |
Specifies if the transaction should wait on the load or continue
asynchronously. You can then check the status with the api="status"
transaction.Possible values are:
|
Optional | <sync>1</sync> |
<spec> |
Outer specification element. Contains spec of source files and uploaded table. | Required | <spec>...</spec> |
<table> |
Table specification element. Contains spec of uploaded table. See Table Tree for XML schema. Some
|
Optional | <table>...</table> |
List of elements in <spec>
XML Element To Send | Description of Sent Element Content | Required/ Optional | Example |
---|---|---|---|
<name> |
File path of table in 1010data.
(directory1.directory2 .....directoryn.table) If you reference an existing table, PowerLoader will replace it. |
Required | <name>mytable</name> |
<title> |
Title as it appears in the user interface. | Optional | <title>my_table_title</title> |
<owner> |
User ID of owner. If omitted, the owner is the user who loads this table. | Optional | <owner>owner_uid</owner> |
<sdesc> |
Short description of the table. | Optional | <sdesc>short_description</sdesc> |
<ldesc> |
Long description of the table. | Optional | <ldesc>long_description</ldesc> |
<replace> |
Replace existing table. 1 for true (default), 0 for false. |
Optional | <replace></replace> |
<link> |
Link header. | Optional | <link></link> |
<ts> |
Time-series flag. Requires that 1 for true, 0 for false (default). |
Optional | <ts>1</ts> |
<temp> |
Specifies whether the table should be created as a worksheet and exist only in
the current session. 1 for true, 0 for false (default). |
Optional | <temp></temp> |
<ignore_null> |
Replace all nulls in the input with spaces. 1 for true, 0 for false (default). |
Optional | <ignore_null>1</ignore_null> |
filter |
Attempt to correct malformed data. Possible values
are: 0 - don't attempt to correct the input (default) 1 - run with correction of input on 2 - run without correction. If an error occurs, rerun with correction on |
Optional |
|
include_diag_cols |
Include diagnostic columns. 1 for true, 0 for false (default). |
Optional |
|
<users> |
Specifies users provisioned for table access. If the
See List of elements in
|
Optional |
<users type="private"> <user>my_uid</user> </users> |
<rectype> |
Record type.
separated - delimited. |
Required | <rectype>fixed</rectype> |
<sep> |
Single character delimiter.
Use binary format instead of string for non-printable characters. |
Required if |
<sep>tab</sep> |
<eor> |
End of record delimiter. none → ""
|
Optional | <eor>crlf</eor> |
<arch> |
Byte order of data.
big - big-endian order |
Optional | <arch></arch> |
<begbytes> |
Number of bytes to skip at the beginning of file. | Optional | <begbytes></begbytes> |
<endbytes> |
Number of bytes to skip at the end of file. | Optional | <endbytes></endbytes> |
<begrecs> |
Number of records to skip in the beginning of file. | Optional | <begrecs>2</begrecs> |
<numrecs> |
Number of records to load in file. | Optional | <numrecs></numrecs> |
<rowsperseg> |
Number of rows per segment. | Optional | <rowsperseg></rowsperseg> |
<maskw> |
The column width you will set for the masking character. The loader will know to continue to read to find the closing `mchr` based on this value. | Optional | <maskw>1000000</maskw> Example: |"ITEM_DESC1 |||| ITEMDESC_2, ......, ITEMDESC_N ||||"|. Here you will need to probably increase `maskw` so that it knows to continue to read until it finds " again. |
<mchr> |
Masking character. If omitted, set "\0" . |
Optional |
|
<thou> |
Thousands separator (character). | Optional | <thou></thou> |
<intna> |
Integer NA value (integer). | Optional | <intna></intna> |
<fltna> |
Float NA value (float). | Optional | <fltna></fltna> |
<inf2na> |
Coerce infinities to NAs. 1 for true, 0 for false (default). |
Optional | <inf2na>1</inf2na> |
<tcrs> |
Treat carriage return special. 1 for true, 0 for false (default). |
Optional | <tcrs>1</tcrs> |
<source> |
Outer source element. See List of elements in
|
Required | <source>...</source> |
<cols> |
List of <col> elements, one for each column in the
file.Note: Do not specify <cols> for parquet files, since the
column data is already in the file.See List of elements in
|
Required | <cols>...</cols> |
List of elements in <source>
XML Element To Send | Description | Required/ Optional | Example |
---|---|---|---|
<s3> |
s3 source information, which
includes:
format is available in version 17.22
and later.The user must have the storage account key pre-configured in their account. See Storage account management for instructions on how to add keys to your user account. Note: When loading parquet files, you should not specify
<cols> section, since all column data is already in the
parquet file. Additionally, when loading parquet files,
<exp> (expression) columns cannot be generated for parquet
files. |
Required |
|
<file> |
File source information, which includes:
format
can be format is available in version 17.22
and later. |
Required | <file>my_file.csv</file> |
<abs> |
abs source information, which
includes:
format is available in version 17.22
and later.The user must have the storage account key pre-configured in their account. See Storage account management for instructions on how to add keys to your user account. Note: When loading parquet files, you should not specify
<cols> section, since all column data is already in the
parquet file. Additionally, when loading parquet files,
<exp> (expression) columns cannot be generated for parquet
files. |
Required | <abs api_key="my_azure_key" storage_account_name="xxxxxx"
container="yyyyyy">load.this.file.gz</abs> |
<gcs> |
gcs source information which
includes:
format
can be The user must have the storage account key pre-configured in their account. See Storage account management for instructions on how to add keys to your user account. Note: When loading parquet files, you should not specify
<cols> section, since all column data is already in the
parquet file. Additionally, when loading parquet files,
<exp> (expression) columns cannot be generated for parquet
files. |
Required | <gcs api_key="my_gcs_key"
bucket="my_bucket">path/to/file</gcs> |
Available as of version 20:
api=addtab
. We have introduced a
new <source>
type called <local>
which includes a
“format” attribute as well as any other attributes that format supports. Here is example
syntax:<in> <spec> <name>demos.books</name> <source> <local format="parquet" column_name_prefix="x1010data" /> </source> </spec> </in>
After this transaction, subsequent api=addtab
“feed” transactions, similar
to how CSVs are currently handled, can be used to upload the actual data, followed by an
api=addtab
“commit” transaction. At the moment, only parquet files are
supported using this mechanism.
List of elements in <users>
XML Element To Send | Description of Sent Element Content | Required/ Optional | Example |
---|---|---|---|
<user> |
Specifies each user that may access the table. Used in conjunction with the <users> element when type is list (or if type is omitted). Each <user> element is nested inside the <users> top-level wrapper. |
Required | <user>my_uid</user> |
List of elements in <col>
XML Element To Send | Description of Sent Element Content | Required/ Optional | Example |
---|---|---|---|
<name> |
Column name (string). | Required | <name>col_name</name> |
<head> |
Column heading (string). | Optional | <head>col_title</head> |
<width> |
Column width in the raw file. Required if |
Required/Optional | <width>10</width> |
<help> |
Column help (text inside ? marker above column). If you want a newline, use <br>. | Optional | <help></help> |
<skip> |
Skip this column (boolean). 1 for true, 0 for false (default). |
Optional | <skip></skip> |
<type> |
Column type (see Column data types). | Required | <type></type> |
<scale> |
Scale column by this number (integer). | Optional | <scale></scale> |
<case> |
Force case. Possible values are |
Optional | <case>lower</case> |
repna |
Replace special values with natural NAs (boolean). 1 for true, 0 for false (default). The special values are specified in the |
Optional |
|
na |
A semicolon-separated list of special values. | Optional |
|
<nowrite> |
Read column but do not write (boolean). 1 for true, 0 for false (default). |
Optional | <nowrite></nowrite> |
<order> |
Column placement relative to other columns (integer). | Optional | <order></order> |
<exp> |
Expression to use if the data type is exp (see Column data types).Note: <exp> columns
cannot be generated for parquet files. |
Optional | <exp></exp> |
<bord> |
Time series break order (integer). Required for at least one column if
Time-series flag is set to true (<ts>1</ts> ). The number
provided here is the relative segmentation order of this column relative to other
columns with the <bord> tag. The column that denotes time should
have the largest number in the sequence. |
Optional | <bord></bord> |
<fix> |
Fix column towards left-hand side (boolean). 1 for true, 0 for false (default). |
Optional | <fix></fix> |
<format> |
Column format (see Column display format types). | Required |
|
<cmp> |
Optional | <cmp></cmp> |
List of elements in each <col> <format>
XML Element To Send | Description of Sent Element Content | Required/ Optional | Example |
---|---|---|---|
<type> |
The column's display format (see Column display format types). | Required | <type></type> |
<width> |
The width of the column in the GUI. | Required | <width></width> |
<dec> |
The number of decimal places to show | Required | <dec></dec> |
XML input example
<in> <spec> <source> <file>mytable20060101.txt</file> <file>mytable20060201.txt</file> </source> <name>mycompany.mytable</name> <title>Mytable 2006</title> <sdesc>short description of the table</sdesc> <ldesc>long description of table</ldesc> <link>FOO</link> <users> <user>user1</user> <user>user2</user> </users> <rectype>fixed</rectype> <eor>crlf</eor> <cols> <col> <name>deal</name> <width>12</width> <head>Deal ID</head> <type>int</type> <format> <type>nocommas</type> <width>3</width> <dec>0</dec> </format> </col> <col> <name>date</name> <width>55</width> <head>Distribution Date</head> <type>yyyymmdd</type> <format> <type>date</type> <width>8</width> </format> </col> <col> <name>loan</name> <width>21</width> <head>Loan`Number</head> <type>int</type> <exp>loan+2</exp> </col> </cols> </spec> </in>
XML response from server example
A successful synchronous addtab
produces the following result:
<rc>0</rc> <msg>addtable finished</msg> <name>mycompany.mytable</name>
A successful asynchronous addtab
produces the following result:
<rc>0</rc> <msg>addtable started</msg> <name>mycompany.mytable</name>
ASYNC load example
The following example shows the full async load process for large tables.
submit: https://www2.1010data.com/gw.k?api=addtab&apiversion=3&uid=$UID&pswd=$PSWD&sid=$SID <in> <spec>...</spec> </in> response: <out> <rc>0</rc> <msg>addtable started</msg> <name>mycompany.mytable</name> </out> submit: https://www2.1010data.com/gw.k?api=status&apiversion=3&uid=$UID&pswd=$PSWD&sid=$SID response: <out> <rc>3</rc> <msg>loading</msg> <path>mycompany.mytable</path> <transaction>load</transaction> <state>started</state> <numrecs>0</numrecs> <ptr>0</ptr> <totbytes>50291871</totbytes> </out> submit: https://www2.1010data.com/gw.k?api=status&apiversion=3&uid=$UID&pswd=$PSWD&sid=$SID response: <out> <rc>4</rc> <msg>load succeeded</msg> <path>mycompany.mytable</path> <transaction>load</transaction> <state>finished</state> <totrecs>466869</totrecs> </out>
SYNC load example
submit: https://www2.1010data.com/gw.k?api=addtab&apiversion=3&uid=$UID&pswd=$PSWD&sid=$SID <in> <spec>...</spec> <sync>1</sync> </in> response: <out> <rc>0</rc> <msg>addtable finished</msg> <name>mycompany.mytable</name> </out>
The following example shows the full sync streaming load process, with a final
commit=1
to signal that all the data has been submitted and the table can
be finalized.
submit: https://www2.1010data.com/gw.k?api=addtab&apiversion=3&uid=$UID&pswd=$PSWD&sid=$SID&path=mycompany.mytable <in> <spec>...</spec> <sync>1</sync> </in> response: <out> <rc>0</rc> <msg>addtable started</msg> <name>mycompany.mytable</name> </out> submit: https://www2.1010data.com/gw.k?api=addtab&apiversion=3&uid=$UID&pswd=$PSWD&sid=$SID&path=mycompany.mytable&sync=1 <in>[data]</in> response: <out> <rc>0</rc> <msg>addtable feed successful</msg> <path>mycompany.mytable</path> <diag></diag> <totbytes>1371799</totbytes> <bytes>1371799</bytes> </out> submit: https://www2.1010data.com/gw.k?api=addtab&apiversion=3&uid=$UID&pswd=$PSWD&sid=$SID&path=mycompany.mytable&commit=1 response: <out> <rc>0</rc> <msg>addtable commit finished</msg> <commit>1</commit> <path>mycompany.mytable</path> </out>