`<materialize>`

Creates a new physical table.

Description

The Macro Language allows you to create complex queries, but those queries apply only to the worksheet in your session. To make any value changes permanent, or to create a new physical table, you must materialize the data to a 1010data Insights Platform table.

Important!: Do not run <materialize> with the variable {@now_} in the query. {@now_} will break your cache and can result in submitting a query more than once.

Syntax

<materialize 
             adv_machs="[MACH_GROUP_1;MACH_GROUP_2;...MACH_GROUP_N]"
             adv_rotate="[ROTATE_AMT|random]"
             adv_stripe="[STRIPE_1,STRIPE_2,...STRIPE_N]"
             adv_type="group|list"
             append="[0|1]" 
             cols="[COLUMN_NAME_1,COLUMN_NAME_2,...COLUMN_NAME_N]"
             compression="[none|fast|faster|fastest|small|smaller|smallest]"
             dedup="[DEDUP_COL_1,DEDUP_COL_2,...DEDUP_COL_N]"
             deduporder="[DEDUP_ORDER_COL]"
             end_append="[0|1]"
             exclude="[EXCLUDE_COL_1,EXCLUDE_COL_2,...EXCLUDE_COL_N]"
             exclusive="[0|1]"       
             indexcols="[INDEX_COL_1,INDEX_COL_2,...INDEX_COL_N]"
             inherit_users"[0|1]" 
             job_responsibility="[0|1]"
             keeporder="[0|1]" 
             keys="[KEYS_LIST_1;KEYS_LIST_2;...KEYS_LIST_N]"
             ldesc="[LONG_DESCRIPTION]"
             ldesc_query="[0|1]" 
             linkheader="[MERGED_TABLE_HEADER]"
             machs="[MACHINE_NAME_1,MACHINE_NAME_2,...MACHINE_NAME_N]"
             makets="[0|1]"
             maxdown="[CELLS]" 
             method="[bundle|columns]"
             nomaxdown="[0|1]"
             owner="[USER_ID]|[GROUP_ID]"
             path="[PATH_TO_FOLDER]" 
             replace="[0|1]"
             restartable="[0|1]" 
             sdesc="[SHORT_DESCRIPTION]" 
             segby="[SEGBY_COL_1,SEGBY_COL_2,...SEGBY_COL_N]"
             segby_advise="[SEGBY_LIST_1;SEGBY_LIST_2;...SEGBY_LIST_N]"
             segmentation="[SEGMENT_SIZE]|[VALUE_EXPRESSION]"
             segopt="[nf|ff|bf|wf|nfd|ffd|bfd|wfd][+|-|0]"
             select="[SELECTION_EXP]"
             shared_target="[0|1]"
             sort="[SORT_COL_1,SORT_COL_2,...SORT_COL_N]"
             sortseg="[SORTSEG_COL_1,SORTSEG_COL_2,...SORTSEG_COL_N]"
             sortseg_advise="[SORTSEG_LIST_1;SORTSEG_LIST_2...SORTSEG_LIST_N]"
             splitseg="[SPLIT_THRESHOLD]"
             store_type="[nvm|ssd]"
             stripe="[NUMBER_OF_TARGET_SERVERS]"
             stripe_factor="[STRIPE_FACTOR]"
             stripemethod="[STRIPE_METHOD_NAME]" 
             targetdir="[PATH_TO_TARGET_FOLDER]"
             temp="[0|1]" 
             title="[TABLE_TITLE]" 
             users="[USER_ID_1,USER_ID_2,...USER_ID_N]"/>

Attributes

adv_machs

A semicolon-separated list of groups of machines used to specify advanced segment assignment. Each group is a comma-separated list of machines.

Note: This attribute overrides the machs and stripe attributes.

The value of the adv_type attribute determines the manner in which the segments will be distributed over the machines.

Note: All advanced attributes (beginning with adv_) require the user to have special privileges.

(Available as of prod-9)

adv_rotate

Allows the list of target machs to be rotated by the specified amount. If you specify a value for adv_rotate, the machs attribute must be non-empty. The value can be any integer. In addition, you can set adv_rotate="random" to pick an integer at random.

For example, if machs="L,M,N" and adv_rotate="1", X lands on M,N,L. If machs="L,M,N" adv_rotate="1" stripe="2", X lands on M,N.

Note: All advanced attributes (beginning with adv_) require the user to have special privileges.

adv_stripe

A comma-separated list of stripe values for each group specified by the adv_machs attribute.

When adv_type="group", this list consists of the number of machines in each group to which the segment should be written.

The number of items in adv_stripe must match the number of groups in adv_machs.

Note: All advanced attributes (beginning with adv_) require the user to have special privileges.

(Available as of version 10.07)

adv_type

Specifies the manner in which the segments are distributed over the machines specified by adv_machs.

Valid values are:

group: Each segment is written to a specified number of machines in each group in the adv_machs list. The number of machines in each group is specified by the corresponding value in the adv_stripe attribute.
list: Each segment is consecutively written to the groups of machines in the adv_machs list, cycling through the groups until complete.
For example, the first segment is written to the first group of machines, the second segment is written to the second group, etc. When the end of the list is reached, the next segment is written to the first group in the list, and so on until all segments are written.

The default value is list.

Note: All advanced attributes (beginning with adv_) require the user to have special privileges.

(Available as of version 10.07)

append

Specifies whether the materialize results should be appended to the target table. append="1" will write to changed segments only. If you wish to write to the last segment only, use end_append.

Note: Exclusive appending (i.e.,

<materialize append="1"
                  exclusive="1">

) leads to fragmentation over time, which will result in gradually declining query performance. The remedy for this is to periodically resegment the whole table to remove any fragmentation that occurs when performing this type of append. For instance, if a user is appending to a table on a daily basis using exclusive append, it is recommended that the table is resegmented once a month or, at the very minimum, once a quarter.

The default value is 0 (i.e., the materialize results are not appended to the target table).

Note: segmentation="0" is ignored when append="1".

cols

A comma-separated list of the columns to be written to the new table.

If omitted, then all columns will be written by default.

compression

The type of compression to use.

Possible values are:

none
fast
faster
fastest
small
smaller
smallest

The default is faster.

Note: This is advanced functionality that requires special privileges.

dedup

A comma-separated list of columns to use as a key for de-duplication.

De-duplication is done at the segment level and, as such, requires that the segmentation strategy be considered and, in most cases, explicitly set.

The default is no columns and no de-duplication.

deduporder

If dedup is set, this specifies a column to use as a sort order when selecting the winning row to keep.

The lowest/earliest value is what determines a record be kept.

Use the backtick character (`) in front of the column name to reverse the sort order.

end_append

Materialize the results of the append to the last segment only.

The default value is end_append="0".

exclude

A comma-separated list of the columns to be excluded from the new table.

If omitted, then no columns will be excluded by default.

exclusive

Indicates whether you have exclusive access and control of this table when using append="1".

Note: This is advanced functionality that requires special privileges.

The default value is 0 (i.e., you do not have exclusive access and control of the table).

indexcols

Specifies which columns in the resulting table should also be indexed.

Indexing provides increased selection performance at the cost of storage.

The default value is no columns.

inherit_users

Specifies whether to inherit user permissions from the parent folder.

If set to 1, the table inherits user permissions from its parent folder.

Note: You cannot both set inherit_users=1 and specify users in the users attribute.

The default is 0 (i.e., do not inherit user permissions from the parent folder).

job_responsibility

Set to 1 on an environment configured for High Availability (HA) to prevent synchronization of the table being materialized. This is assuming a job on the other HA sites will also be performing the same materialize, so synchronization will be unnecessary. Does nothing in an environment not configured for HA.

The default is 0.

(Available as of version 9.36)

keeporder

Specifies whether to create the new table in the same order as the source table.

If keeporder="0" is specified, no guarantee is made that the new table will be in same order as the source table.

If keeporder="1" is specified, the new table will be in the same order as the source table.

The default value of keeporderis 0 (no guarantee that the new table will be in the same order).

keys

A list of lists indicating the keys metadata for the resulting table.

The default is empty.

ldesc

Specifies the long description of the new table.

ldesc_query

Indicates whether to use the query in effect as the long description of the resulting table.

The default value is 0 (i.e., do not use the query in effect as the long description).

linkheader

Specifies the link header of the new table.

machs

A comma-separated list of the target servers to write table data to.

Note: This is advanced functionality that requires special privileges.

makets

Specifies whether to set the legacy timeseries metadata.

The default value is 0 (i.e., do not set the legacy timeseries metadata).

maxdown

Sets target table's maximum cell download limit to the value specified.

The default value is no limit.

method

The method for storing intermediate data.

Specify columns to use individual intermediate files.

The default value is bundle, which groups some intermediate data.

nomaxdown

Specifies whether to remove any limit on the maximum cell download for the table when using replace=1.

The default value is 0 (i.e., the maxdown attribute will be observed even when using replace=1).

owner

The owner of the resulting table.

Must be a valid user ID or group ID capable of owning the resulting table.

The default is the current logged in user.

path

Specifies the folder path or full table pathname under which the new table will be created.

Note: You must have upload permissions to the target folder.

If omitted, the default location is a new unambiguous name in the user's My Data folder.

When choosing a full table pathname, you must follow 1010data naming conventions for table names, which include:

Only letters, digits, and underscores (_) are permitted.
The table name must start with a letter.
Avoid ending the table name with _.
Upper case letters are permitted. However, it is 1010data best practice to avoid them. Upper case letters are more commonly used in table titles, not table names.

replace

Specifies whether the new table should be allowed to replace an existing table. replace="1" writes the result to an entirely new set of segments.

Note: You must own an existing table that is to be replaced.

The default is 0 (i.e., the new table should not replace the existing table).

restartable

Indicates whether to use a unique storage location for each instance of materialize.

The default value is 1 (i.e., use a unique storage location for each instance).

Note: This is advanced functionality that requires special privileges.

sdesc

Specifies the short description of the new table.

segby

Specify one or more columns that will be used to segment the new table.

The table will automatically be marked as segby.

segby_advise

Segmentation is more powerful if you have information about how different columns are related. For example, if you have a table of data about grocery stores with the column names store_id, city, and state, you can be certain that every row with a given value for store_id has the same values for city and state (since the store can exist in only one city and state). Therefore, if a table is segby state, it is also segby store_id.

segby_advise contains a list of lists indicating all segby groupings.

The lists of columns are separated by semicolons. The columns are separated by commas.

The default is the segby grouping.

Note: segby_advise only works if the segby attribute is specified.

segmentation

May either be a constant integer or a value expression.

If a positive integer, this specifies the target segment size, which may be used in conjunction with segby or sortseg.

If a value expression, this must evaluate to an integer between 0 and some non-negative number at each row, with no numbers "left out" (but may evaluate to N/A at some rows). In this case, the value at each row determines the target segment of the row. If a value expression is specified, this overrides the segby or sortseg attributes, which may still be specified in an advisory manner to mark the resulting table as segby/sortseg, respectively.

Note: In this case, of course, you are responsible for ensuring that the segmentation expression does, in fact, lead to the specified segby/sortseg property.

Special cases:

segmentation="0" - specifies that the new table will have a single segment
segmentation="seg_" - specifies that the new table will have identical segmentation to the source table

segopt

The packing strategy to use when using segby or sortseg.

Possible values are:

nf
ff
bf
wf
nfd
ffd
bfd
wfd

Any of these can be optionally followed by:

The default is nf.

select

Apply an expression to the results of the materialize operation. The expression can be any expression you would use in the value attribute of <sel>. However, select runs after the segments are created, leveraging the new segmentation of the table defined in <materialize>. You can use a g_function in your selection, since the segments have already been created. <materialize select> performs a selection independently in each segment.

For example, if you use <materialize segby=date> to create a segmentation by date, you can then use select=g_hi(date;...) to select only the latest date in each group.

If you use select with append="1", only the changed segments are affected by select.

shared_target

If shared_target is set to 1, it will keep all subprocs running on the machines of the originating queries and will write the final output table files remotely, if needed. The resulting table will have process_anywhere set, allowing for network reads of the table.

The default value is 0.

Note: shared_target is intended for shared storage type situations where the target is not a compute node.

sort

Specify one or more columns that will be sorted within each segment.

sortseg

Specify one or more columns that will be used to sort and segment the new table.

The table will automatically be marked as sortseg.

sortseg_advise

Segmentation by one column may imply segmentation by another column. For example, if you have grocery store data and segment by the state column, it implies that it is also segmented by store_id (because the store can exist in only one state). If both state and store_id are sortseg, you can then include state and store_id in sortseg_advise.

sortseg_advise contains a list of lists indicating all sortseg groupings.

The lists of columns are separated by semicolons. The columns are separated by commas.

The default is the sortseg grouping.

Note: sortseg_advise only works if the sortseg attribute is specified.

splitseg

An integer threshold used to split up large segments. For use with a segmentation expression.

In any target segment, if the resulting row count exceeds the splitseg threshold, it will be split in half until all the pieces are less than the threshold.

Prior to version 10.12, the segby attribute was required by splitseg and was maintained during the split.

As of version 10.12, splitseg no longer requires segby. If segby is not specified, the segment is split in the middle until the threshold is reached.

The default is no threshold.

store_type

Allows you to materialize an existing table and move it to a different storage tier when replace="1". This is done explicitly by indicating store_type="ssd" for a nvm table, or store_type="nvm" for a ssd table. If store_type is omitted, nvm is the default storage tier.

You cannot change storage tiers with append="1".

stripe

Specifies the degree of redundancy for the table.

Each segment will be written to the specified number of the target servers, in a diagonally-striped fashion.

If stripe is omitted, materialize selects a stripe based on the default value for your ID/Account.

stripe_factor

Stripe factors are for environments that can be scaled up or down. It is a way for a user to not have to worry about a table's stripe in environments that easily scale with the number of machines. <materialize> sets a stripe factor when it is explicitly stated. There is no harm setting stripe_factor in environments that don't scale. If the environment does not scale, the stripe factor is ignored.

For example, a stripe_factor of 0.2 results in a stripe of 2 when there are 10 upload machines, or a stripe of 3 when there are 11 upload machines. If stripe is 2 while stripe_factor is empty, the result is a stripe of 2 for any number of upload machines greater than 1.

An error is reported before any computation is performed in <materialize> if stripe_factor is either greater than 1.0 or less than or equal to 0.0.

(Available as of version 18.05)

stripemethod

The method of shuffling the machines among stripes.

The default value is mirror, which results in traversing the machine list in order, repeating when finished.

targetdir

Specifies the target file system directory to write table data to.

Note: This is advanced functionality that requires special privileges.

temp

Specifies whether the metadata should be committed only to the current session.

The default is 0 (i.e., the metadata persists beyond the current session).

title

Specifies the title of the new table.

users

A comma-separated list of the users who will have access to the new table.

If absent, privileges are inherited from the parent folder. If users="" is specified, the new table is private (i.e., accessible only to the user).

Example: Change the segmentation of a table

In this example, the <materialize> operation will start with the Sales Item Detail table found at pub.doc.retail.salesdetail and save a new version of the table with a different segmentation. In this case, the new table will be segmented by the sku column.

<base table="pub.doc.retail.salesdetail"/>
<materialize path="pub.doc.retail.altseg.sales_detail_sku" segby="sku" 
 sort="date" title="Sales Item Detail Segmented by SKU"/>

Example: Advanced segment assignment

The following example uses the adv_machs, adv_type, and adv_stripe attributes to put each segment on 5 machines: two from the a group, and all three from the b group.

<materialize adv_machs="a0,a1,a2;b0,b1,b2" adv_type="group" adv_stripe="2,3"/>

Example: Segby advise

The following example shows how to use segby_advise to show how two columns are related in a segby. If a table of grocery stores in segby state, you know that it is also segby store_id, because a given store can only exist in one state.

<materialize path="some.path" segby="state" segby_advise="state;store_id"/>