addtab (Load a large table)

The addtab transaction loads a table into the 1010data system.

Table loads can be streaming or file-based. File-based loads require the user to have a 1010data FTP account. The source data files must be fully uploaded to the user’s FTP account, AWS S3 bucket, or Azure blob storage before executing the addtab transaction. Additionally, for source files in AWS S3 or Azure blob, the user account must be configured with the appropriate keys to access the bucket. See Storage account management for instructions on how to add keys to your user account.

As input, addtab accepts an XML version of the 1010 PowerLoader V1 spec file. If you still wish to use the legacy format, use the convert transaction to convert your PowerLoader V1 spec file into an XML spec.

This transaction can be invoked synchronously or asynchronously with the <sync> tag. In sync mode, the transaction returns when the table is finished loading. In async mode, when the transaction returns a response, it does not mean the table is created, it means table creation has successfully been initiated. With async mode, you use the status transaction to check on the status of the table creation. Streaming table loads must be invoked synchronously. See the SYNC load example and ASYNC load example sections at the end of this topic.

Sync mode is recommended for small tables (<1 million records) and async mode is recommended for large tables (> 1 million records).
Note: addtab and addtable are interchangeable (addtable simply calls addtab).

Query string

The query string for addtab contains the following parameters:

api=addtab
The name of the API transaction: addtab.
commit
A boolean field. commit=1 should be submitted as the final transaction for synchronous streaming loads to signal all the data has been submitted and the table can now be finalized.
path
The full path (directory1.directory2...directoryn.table) of the new table to be created. Required for streaming loads.
apiversion
The version of the API that should handle the requested transaction.

For the most up-to-date version, use apiversion=3.

uid
A valid 1010data user name.
pswd
A valid password for the 1010data user name provided to the uid parameter.
Note: The pswd value should be the encrypted password returned from the login transaction.
sid
The session ID for the current API session.

This value is returned by the login transaction and must be provided to every transaction (except login).

kill (optional)
Valid values for kill are as follows:
yes
Terminate the existing session and start a new one.
no
Do not terminate the existing session.

If no session exists, a new one will be created. If a session does exist, an error is returned.

Note: If kill=no and a session is active, the system returns an error.
possess
Log in and possess the existing session.
auth
Authenticate with the system but do not kill, possess, or start any session.

The default is yes.

Query String Example

https://www2.1010data.com/gw.k?api=addtab&apiversion=3&uid=$UID&pswd=$PSWD&sid=$SID

XML Input to Server

The specifications contain the following elements:

XML Element To Send Description of Sent Element Content Required/ Optional Example
<sync> Specifies if the transaction should wait on the load or continue asynchronously. You can then check the status with the api="status" transaction.
Possible values are:
  • 1 if you want to load synchronously
  • 0 if you want the table to load asynchronously (default)
Optional <sync>1</sync>
<spec> Outer specification element. Contains spec of source files and uploaded table. Required <spec>...</spec>
<table> Table specification element. Contains spec of uploaded table. See Table Tree for XML schema.

Some <spec> fields may be duplicated by <table> fields, in which case, the <table> field will take precedence.

Optional <table>...</table>

List of elements in <spec>

XML Element To Send Description of Sent Element Content Required/ Optional Example
<name> File path of table in 1010data. (
directory1.directory2
    .....directoryn.table
)

If you reference an existing table, PowerLoader will replace it.

Required <name>mytable</name>
<title> Title as it appears in the user interface. Optional <title>my_table_title</title>
<owner> User ID of owner. If omitted, the owner is the user who loads this table. Optional <owner>owner_uid</owner>
<sdesc> Short description of the table. Optional <sdesc>short_description</sdesc>
<ldesc> Long description of the table. Optional <ldesc>long_description</ldesc>
<replace> Replace existing table.

1 for true (default), 0 for false.

Optional <replace></replace>
<link> Link header. Optional <link></link>
<ts> Time-series flag.

Requires that <bord> tags are provided in at least one column.

1 for true, 0 for false (default).

Optional <ts>1</ts>
<temp> Specifies whether the table should be created as a worksheet and exist only in the current session.

1 for true, 0 for false (default).

Optional <temp></temp>
<ignore_null> Replace all nulls in the input with spaces.

1 for true, 0 for false (default).

Optional <ignore_null>1</ignore_null>
filter Attempt to correct malformed data. Possible values are:

0 - don't attempt to correct the input (default)

1 - run with correction of input on

2 - run without correction. If an error occurs, rerun with correction on

Optional
<spec [filter=integer][include_diag_cols=1]>
...
</spec>
include_diag_cols Include diagnostic columns.

1 for true, 0 for false (default).

Optional
<spec [filter=integer][include_diag_cols=1]>
...
</spec>
<users> Specifies users provisioned for table access.

If the <users> element is excluded, only the table owner may access the table. See Users Tree for XML schema and <users> (Top-level wrapper for users tree) for usage details.

private - No user, other than the table's owner, may access the table (default).

list - Only the table's owner and users listed in the users tree may access the table.

inherit - All users that have access to the table's parent directory may access the table.

See List of elements in <users> that follows for more information.

Optional
<users type="private">
    <user>my_uid</user>
</users>
<rectype> Record type.

fixed - fixed length (default)

separated - delimited.
Required <rectype>fixed</rectype>
<sep> Single character delimiter.

tab → "\t"

spc → space

Use binary format instead of string for non-printable characters.

Required if rectype is separated.

<sep>tab</sep>
<eor> End of record delimiter.

none → ""

lf → "\n"

nl → "\n"

crlf → "\r\n"

crnl → "\r\n"

cr → "\r"

Optional <eor>crlf</eor>
<arch> Byte order of data.

little - little-endian order

big - big-endian order
Optional <arch></arch>
<begbytes> Number of bytes to skip at the beginning of file. Optional <begbytes></begbytes>
<endbytes> Number of bytes to skip at the end of file. Optional <endbytes></endbytes>
<begrecs> Number of records to skip in the beginning of file. Optional <begrecs>2</begrecs>
<numrecs> Number of records to load in file. Optional <numrecs></numrecs>
<rowsperseg> Number of rows per segment. Optional <rowsperseg></rowsperseg>
<maskw> The column width you will set for the masking character. The loader will know to continue to read to find the closing `mchr` based on this value. Optional <maskw>1000000</maskw>

Example: |"ITEM_DESC1 |||| ITEMDESC_2, ......, ITEMDESC_N ||||"|.

Here you will need to probably increase `maskw` so that it knows to continue to read until it finds " again.

<mchr> Masking character. If omitted, set "\0". Optional

<mchr>"</mchr>

<thou> Thousands separator (character). Optional <thou></thou>
<intna> Integer NA value (integer). Optional <intna></intna>
<fltna> Float NA value (float). Optional <fltna></fltna>
<inf2na> Coerce infinities to NAs.

1 for true, 0 for false (default).

Optional <inf2na>1</inf2na>
<tcrs> Treat carriage return special.

1 for true, 0 for false (default).

Optional <tcrs>1</tcrs>
<source> Outer source element.

See List of elements in <source> that follows for more information.

Required <source>...</source>
<cols> List of <col> elements, one for each column in the file.
Note: Do not specify <cols> for parquet files, since the column data is already in the file.

See List of elements in <cols> that follows for more information.

Required <cols>...</cols>

List of elements in <source>

XML Element To Send Description Required/ Optional Example
<s3> s3 source information, which includes:

api_key

bucket

format

format can be "csv", "xlsx", or "parquet". The default value is "csv".

format is available in version 17.22 and later.

The user must have the storage account key pre-configured in their account. See Storage account management for instructions on how to add keys to your user account.

Note: When loading parquet files, you should not specify <cols> section, since all column data is already in the parquet file. Additionally, when loading parquet files, <exp> (expression) columns cannot be generated for parquet files.
Required
<s3 api_key="sa_s3_bmathew_master" 
bucket="1010-client-s3" 
format="parquet">
bigTable1MillJ.parquet</s3>
<file> File source information, which includes:

format

format can be "csv", "xlsx", or "parquet". The default value is "csv".

format is available in version 17.22 and later.
Required <file>my_file.csv</file>
<abs> abs source information, which includes:

api_key

storage_account_name

container

format can be "csv", "xlsx", or "parquet". The default value is "csv".

format is available in version 17.22 and later.

The user must have the storage account key pre-configured in their account. See Storage account management for instructions on how to add keys to your user account.

Note: When loading parquet files, you should not specify <cols> section, since all column data is already in the parquet file. Additionally, when loading parquet files, <exp> (expression) columns cannot be generated for parquet files.
Required <abs api_key="my_azure_key" storage_account_name="xxxxxx" container="yyyyyy">load.this.file.gz</abs>
<gcs> gcs source information which includes:

api_key

bucket

format can be "csv", "xlsx" or "parquet". The default value is "csv".

The user must have the storage account key pre-configured in their account. See Storage account management for instructions on how to add keys to your user account.

Note: When loading parquet files, you should not specify <cols> section, since all column data is already in the parquet file. Additionally, when loading parquet files, <exp> (expression) columns cannot be generated for parquet files.
Required <gcs api_key="my_gcs_key" bucket="my_bucket">path/to/file</gcs>

Available as of version 20:

There is now a facility to load parquet files directly from the client using a mechanism similar to how csv's are loaded now using api=addtab. We have introduced a new <source> type called <local> which includes a “format” attribute as well as any other attributes that format supports. Here is example syntax:
<in>
  <spec>
    <name>demos.books</name>
    <source>
      <local format="parquet" column_name_prefix="x1010data" />
    </source>
  </spec>
</in>

After this transaction, subsequent api=addtab “feed” transactions, similar to how CSVs are currently handled, can be used to upload the actual data, followed by an api=addtab “commit” transaction. At the moment, only parquet files are supported using this mechanism.

List of elements in <users>

XML Element To Send Description of Sent Element Content Required/ Optional Example
<user> Specifies each user that may access the table.

Used in conjunction with the <users> element when type is list (or if type is omitted).

Each <user> element is nested inside the <users> top-level wrapper.

Required <user>my_uid</user>

List of elements in <col>

XML Element To Send Description of Sent Element Content Required/ Optional Example
<name> Column name (string). Required <name>col_name</name>
<head> Column heading (string). Optional <head>col_title</head>
<width> Column width in the raw file.

Required if rectype is separated, optional if rectype is fixed.

Required/Optional <width>10</width>
<help> Column help (text inside ? marker above column). If you want a newline, use <br>. Optional <help></help>
<skip> Skip this column (boolean).

1 for true, 0 for false (default).

Optional <skip></skip>
<type> Column type (see Column data types). Required <type></type>
<scale> Scale column by this number (integer). Optional <scale></scale>
<case> Force case.

Possible values are lower, upper.

Optional <case>lower</case>
repna Replace special values with natural NAs (boolean).

1 for true, 0 for false (default).

The special values are specified in the na field.

Optional
<col [repna=1] [na=list;of;values]>
...
</col>
na A semicolon-separated list of special values. Optional
<col [repna=1] [na=list;of;values]>
...
</col>
<nowrite> Read column but do not write (boolean).

1 for true, 0 for false (default).

Optional <nowrite></nowrite>
<order> Column placement relative to other columns (integer). Optional <order></order>
<exp> Expression to use if the data type is exp (see Column data types).
Note: <exp> columns cannot be generated for parquet files.
Optional <exp></exp>
<bord> Time series break order (integer). Required for at least one column if Time-series flag is set to true (<ts>1</ts>). The number provided here is the relative segmentation order of this column relative to other columns with the <bord> tag. The column that denotes time should have the largest number in the sequence. Optional <bord></bord>
<fix> Fix column towards left-hand side (boolean).

1 for true, 0 for false (default).

Optional <fix></fix>
<format> Column format (see Column display format types). Required
<format>
<width></width>
</format>
<cmp> Optional <cmp></cmp>

List of elements in each <col> <format>

XML Element To Send Description of Sent Element Content Required/ Optional Example
<type> The column's display format (see Column display format types). Required <type></type>
<width> The width of the column in the GUI. Required <width></width>
<dec> The number of decimal places to show Required <dec></dec>

XML input example

<in>
<spec>
   <source>
    <file>mytable20060101.txt</file>
    <file>mytable20060201.txt</file>
  </source>
  <name>mycompany.mytable</name>
  <title>Mytable 2006</title>
  <sdesc>short description of the table</sdesc>
  <ldesc>long description of table</ldesc>
  <link>FOO</link>
  <users>
    <user>user1</user>
    <user>user2</user>
  </users>
  <rectype>fixed</rectype>
  <eor>crlf</eor>
  <cols>
  <col>
      <name>deal</name>
      <width>12</width>
      <head>Deal ID</head>
      <type>int</type>
      <format>
          <type>nocommas</type>
          <width>3</width>
          <dec>0</dec>
      </format>
  </col>
  <col>
    <name>date</name>
    <width>55</width>
    <head>Distribution Date</head>
    <type>yyyymmdd</type>
    <format>
      <type>date</type>
      <width>8</width>
    </format>
  </col>
  <col>
    <name>loan</name>
    <width>21</width>
    <head>Loan`Number</head>
    <type>int</type>
    <exp>loan+2</exp>
  </col>
  </cols>
</spec>
</in>

XML response from server example

A successful synchronous addtab produces the following result:

<rc>0</rc>
<msg>addtable finished</msg>
<name>mycompany.mytable</name>

A successful asynchronous addtab produces the following result:

<rc>0</rc>
<msg>addtable started</msg>
<name>mycompany.mytable</name>

ASYNC load example

The following example shows the full async load process for large tables.

submit: 
https://www2.1010data.com/gw.k?api=addtab&apiversion=3&uid=$UID&pswd=$PSWD&sid=$SID
<in>
<spec>...</spec>
</in>

response: 
<out>
<rc>0</rc>
<msg>addtable started</msg>
<name>mycompany.mytable</name>
</out>

submit: 
https://www2.1010data.com/gw.k?api=status&apiversion=3&uid=$UID&pswd=$PSWD&sid=$SID

response: 
<out>
<rc>3</rc>
<msg>loading</msg>
<path>mycompany.mytable</path>
<transaction>load</transaction>
<state>started</state>
<numrecs>0</numrecs>
<ptr>0</ptr>
<totbytes>50291871</totbytes>
</out>

submit: 
https://www2.1010data.com/gw.k?api=status&apiversion=3&uid=$UID&pswd=$PSWD&sid=$SID

response: 
<out>
<rc>4</rc>
<msg>load succeeded</msg>
<path>mycompany.mytable</path>
<transaction>load</transaction>
<state>finished</state>
<totrecs>466869</totrecs>
</out>

SYNC load example

The following example shows the full sync load process for small tables.
submit: 
https://www2.1010data.com/gw.k?api=addtab&apiversion=3&uid=$UID&pswd=$PSWD&sid=$SID
<in>
<spec>...</spec>
<sync>1</sync>
</in>

response: 
<out>
<rc>0</rc>
<msg>addtable finished</msg>
<name>mycompany.mytable</name>
</out>

The following example shows the full sync streaming load process, with a final commit=1 to signal that all the data has been submitted and the table can be finalized.

submit: 
https://www2.1010data.com/gw.k?api=addtab&apiversion=3&uid=$UID&pswd=$PSWD&sid=$SID&path=mycompany.mytable
<in>
<spec>...</spec>
<sync>1</sync>
</in>

response: 
<out>
<rc>0</rc>
<msg>addtable started</msg>
<name>mycompany.mytable</name>
</out>

submit: 
https://www2.1010data.com/gw.k?api=addtab&apiversion=3&uid=$UID&pswd=$PSWD&sid=$SID&path=mycompany.mytable&sync=1
<in>[data]</in>

response: 
<out>
<rc>0</rc>
<msg>addtable feed successful</msg>
<path>mycompany.mytable</path>
<diag></diag>
<totbytes>1371799</totbytes>
<bytes>1371799</bytes>
</out>

submit: 
https://www2.1010data.com/gw.k?api=addtab&apiversion=3&uid=$UID&pswd=$PSWD&sid=$SID&path=mycompany.mytable&commit=1

response: 
<out>
<rc>0</rc>
<msg>addtable commit finished</msg>
<commit>1</commit>
<path>mycompany.mytable</path>
</out>