<materialize>
Creates a new physical table.
Description
The Macro Language allows you to create complex queries, but those queries apply only to the worksheet in your session. To make any value changes permanent, or to create a new physical table, you must materialize the data to a 1010data Insights Platform table.
Important!: Do not run <materialize>
with the variable
{@now_}
in the query. {@now_}
will break your cache and
can result in submitting a query more than once.
Syntax
<materialize adv_machs="[MACH_GROUP_1;MACH_GROUP_2;...MACH_GROUP_N]" adv_rotate="[ROTATE_AMT|random]" adv_stripe="[STRIPE_1,STRIPE_2,...STRIPE_N]" adv_type="group|list" append="[0|1]" cols="[COLUMN_NAME_1,COLUMN_NAME_2,...COLUMN_NAME_N]" compression="[none|fast|faster|fastest|small|smaller|smallest]" dedup="[DEDUP_COL_1,DEDUP_COL_2,...DEDUP_COL_N]" deduporder="[DEDUP_ORDER_COL]" end_append="[0|1]" exclude="[EXCLUDE_COL_1,EXCLUDE_COL_2,...EXCLUDE_COL_N]" exclusive="[0|1]" indexcols="[INDEX_COL_1,INDEX_COL_2,...INDEX_COL_N]" inherit_users"[0|1]" job_responsibility="[0|1]" keeporder="[0|1]" keys="[KEYS_LIST_1;KEYS_LIST_2;...KEYS_LIST_N]" ldesc="[LONG_DESCRIPTION]" ldesc_query="[0|1]" linkheader="[MERGED_TABLE_HEADER]" machs="[MACHINE_NAME_1,MACHINE_NAME_2,...MACHINE_NAME_N]" makets="[0|1]" maxdown="[CELLS]" method="[bundle|columns]" nomaxdown="[0|1]" owner="[USER_ID]|[GROUP_ID]" path="[PATH_TO_FOLDER]" replace="[0|1]" restartable="[0|1]" sdesc="[SHORT_DESCRIPTION]" segby="[SEGBY_COL_1,SEGBY_COL_2,...SEGBY_COL_N]" segby_advise="[SEGBY_LIST_1;SEGBY_LIST_2;...SEGBY_LIST_N]" segmentation="[SEGMENT_SIZE]|[VALUE_EXPRESSION]" segopt="[nf|ff|bf|wf|nfd|ffd|bfd|wfd][+|-|0]" select="[SELECTION_EXP]" shared_target="[0|1]" sort="[SORT_COL_1,SORT_COL_2,...SORT_COL_N]" sortseg="[SORTSEG_COL_1,SORTSEG_COL_2,...SORTSEG_COL_N]" sortseg_advise="[SORTSEG_LIST_1;SORTSEG_LIST_2...SORTSEG_LIST_N]" splitseg="[SPLIT_THRESHOLD]" store_type="[nvm|ssd]" stripe="[NUMBER_OF_TARGET_SERVERS]" stripe_factor="[STRIPE_FACTOR]" stripemethod="[STRIPE_METHOD_NAME]" targetdir="[PATH_TO_TARGET_FOLDER]" temp="[0|1]" title="[TABLE_TITLE]" users="[USER_ID_1,USER_ID_2,...USER_ID_N]"/>
Attributes
adv_machs
- A semicolon-separated list of groups of machines used to specify advanced segment
assignment. Each group is a comma-separated list of machines.Note: This attribute overrides the
machs
andstripe
attributes.The value of the
adv_type
attribute determines the manner in which the segments will be distributed over the machines.Note: All advanced attributes (beginning withadv_
) require the user to have special privileges.(Available as of prod-9)
adv_rotate
- Allows the list of target machs to be rotated by the specified amount. If you specify
a value for
adv_rotate
, themachs
attribute must be non-empty. The value can be any integer. In addition, you can setadv_rotate="random"
to pick an integer at random.For example, if
machs="L,M,N"
andadv_rotate="1"
, X lands on M,N,L. Ifmachs="L,M,N" adv_rotate="1" stripe="2"
, X lands on M,N.Note: All advanced attributes (beginning withadv_
) require the user to have special privileges. adv_stripe
- A comma-separated list of stripe values for each group specified by the
adv_machs
attribute.When
adv_type="group"
, this list consists of the number of machines in each group to which the segment should be written.The number of items inadv_stripe
must match the number of groups inadv_machs
.Note: All advanced attributes (beginning withadv_
) require the user to have special privileges.(Available as of version 10.07)
adv_type
- Specifies the manner in which the segments are distributed over the machines specified
by
adv_machs
.Valid values are:
group
- Each segment is written to a specified number of machines in each group in the
adv_machs
list. The number of machines in each group is specified by the corresponding value in theadv_stripe
attribute. list
- Each segment is consecutively written to the groups of machines in the
adv_machs
list, cycling through the groups until complete.For example, the first segment is written to the first group of machines, the second segment is written to the second group, etc. When the end of the list is reached, the next segment is written to the first group in the list, and so on until all segments are written.
The default value islist
.Note: All advanced attributes (beginning withadv_
) require the user to have special privileges.(Available as of version 10.07)
append
- Specifies whether the materialize results should be appended to the target table.
append="1"
will write to changed segments only. If you wish to write to the last segment only, useend_append
.Note: Exclusive appending (i.e.,<materialize append="1" exclusive="1">
) leads to fragmentation over time, which will result in gradually declining query performance. The remedy for this is to periodically resegment the whole table to remove any fragmentation that occurs when performing this type of append. For instance, if a user is appending to a table on a daily basis using exclusive append, it is recommended that the table is resegmented once a month or, at the very minimum, once a quarter.The default value is 0 (i.e., the materialize results are not appended to the target table).
Note:segmentation="0"
is ignored whenappend="1"
. cols
- A comma-separated list of the columns to be written to the new table.
If omitted, then all columns will be written by default.
compression
- The type of compression to use.
Possible values are:
none
fast
faster
fastest
small
smaller
smallest
The default is
faster
.Note: This is advanced functionality that requires special privileges. dedup
- A comma-separated list of columns to use as a key for de-duplication.
De-duplication is done at the segment level and, as such, requires that the segmentation strategy be considered and, in most cases, explicitly set.
The default is no columns and no de-duplication.
deduporder
- If
dedup
is set, this specifies a column to use as a sort order when selecting the winning row to keep.The lowest/earliest value is what determines a record be kept.
Use the backtick character (
`
) in front of the column name to reverse the sort order. end_append
- Materialize the results of the append to the last segment only.
The default value is
end_append="0"
. exclude
- A comma-separated list of the columns to be excluded from the new table.
If omitted, then no columns will be excluded by default.
exclusive
- Indicates whether you have exclusive access and control of this table when using
append="1"
.Note: This is advanced functionality that requires special privileges.The default value is 0 (i.e., you do not have exclusive access and control of the table).
indexcols
- Specifies which columns in the resulting table should also be indexed.
Indexing provides increased selection performance at the cost of storage.
The default value is no columns.
inherit_users
- Specifies whether to inherit user permissions from the parent folder.If set to 1, the table inherits user permissions from its parent folder.Note: You cannot both set
inherit_users=1
and specify users in theusers
attribute.The default is 0 (i.e., do not inherit user permissions from the parent folder).
job_responsibility
- Set to 1 on an environment configured for High Availability (HA) to prevent
synchronization of the table being materialized. This is assuming a job on the other HA
sites will also be performing the same materialize, so synchronization will be
unnecessary. Does nothing in an environment not configured for HA.
The default is 0.
(Available as of version 9.36)
keeporder
- Specifies whether to create the new table in the same order as the source table.
If
keeporder="0"
is specified, no guarantee is made that the new table will be in same order as the source table.If
keeporder="1"
is specified, the new table will be in the same order as the source table.The default value of
keeporder
is 0 (no guarantee that the new table will be in the same order). keys
- A list of lists indicating the
keys
metadata for the resulting table.The default is empty.
ldesc
- Specifies the long description of the new table.
ldesc_query
- Indicates whether to use the query in effect as the long description of the resulting
table.
The default value is 0 (i.e., do not use the query in effect as the long description).
linkheader
- Specifies the link header of the new table.
machs
- A comma-separated list of the target servers to write table data to. Note: This is advanced functionality that requires special privileges.
makets
- Specifies whether to set the legacy timeseries metadata.
The default value is 0 (i.e., do not set the legacy timeseries metadata).
maxdown
- Sets target table's maximum cell download limit to the value specified.
The default value is no limit.
method
- The method for storing intermediate data.
Specify
columns
to use individual intermediate files.The default value is
bundle
, which groups some intermediate data. nomaxdown
- Specifies whether to remove any limit on the maximum cell download for the table when
using
replace=1
.The default value is 0 (i.e., the
maxdown
attribute will be observed even when usingreplace=1
). owner
- The owner of the resulting table.
Must be a valid user ID or group ID capable of owning the resulting table.
The default is the current logged in user.
path
- Specifies the folder path or full table pathname under which the new table will be created.Note: You must have upload permissions to the target folder.
If omitted, the default location is a new unambiguous name in the user's My Data folder.
When choosing a full table pathname, you must follow 1010data naming conventions for table names, which include:- Only letters, digits, and underscores (
_
) are permitted. - The table name must start with a letter.
- Avoid ending the table name with
_
. - Upper case letters are permitted. However, it is 1010data best practice to avoid them. Upper case letters are more commonly used in table titles, not table names.
- Only letters, digits, and underscores (
replace
- Specifies whether the new table should be allowed to replace an existing table.
replace="1"
writes the result to an entirely new set of segments.Note: You must own an existing table that is to be replaced.The default is 0 (i.e., the new table should not replace the existing table).
restartable
- Indicates whether to use a unique storage location for each instance of
materialize.
The default value is 1 (i.e., use a unique storage location for each instance).
Note: This is advanced functionality that requires special privileges. sdesc
- Specifies the short description of the new table.
segby
- Specify one or more columns that will be used to segment the new table.
The table will automatically be marked as segby.
segby_advise
Segmentation is more powerful if you have information about how different columns are related. For example, if you have a table of data about grocery stores with the column names
store_id
,city
, andstate
, you can be certain that every row with a given value forstore_id
has the same values forcity
andstate
(since the store can exist in only one city and state). Therefore, if a table is segbystate
, it is also segbystore_id
.segby_advise
contains a list of lists indicating all segby groupings.The lists of columns are separated by semicolons. The columns are separated by commas.
The default is the
segby
grouping.Note:segby_advise
only works if thesegby
attribute is specified.segmentation
- May either be a constant integer or a value expression.
If a positive integer, this specifies the target segment size, which may be used in conjunction with
segby
orsortseg
.If a value expression, this must evaluate to an integer between 0 and some non-negative number at each row, with no numbers "left out" (but may evaluate to N/A at some rows). In this case, the value at each row determines the target segment of the row. If a value expression is specified, this overrides thesegby
orsortseg
attributes, which may still be specified in an advisory manner to mark the resulting table as segby/sortseg, respectively.Note: In this case, of course, you are responsible for ensuring that the segmentation expression does, in fact, lead to the specified segby/sortseg property.Special cases:segmentation="0"
- specifies that the new table will have a single segmentsegmentation="seg_"
- specifies that the new table will have identical segmentation to the source table
segopt
- The packing strategy to use when using
segby
orsortseg
.Possible values are:
nf
ff
bf
wf
nfd
ffd
bfd
wfd
Any of these can be optionally followed by:+
-
0
The default is
nf
. select
- Apply an expression to the results of the materialize operation. The expression can be
any expression you would use in the
value
attribute of<sel>
. However,select
runs after the segments are created, leveraging the new segmentation of the table defined in<materialize>
. You can use a g_function in your selection, since the segments have already been created.<materialize select>
performs a selection independently in each segment.For example, if you use
<materialize segby=date>
to create a segmentation by date, you can then useselect=g_hi(date;...)
to select only the latest date in each group.If you use
select
withappend="1"
, only the changed segments are affected byselect
. shared_target
-
If
shared_target
is set to 1, it will keep all subprocs running on the machines of the originating queries and will write the final output table files remotely, if needed. The resulting table will haveprocess_anywhere
set, allowing for network reads of the table.The default value is 0.
Note:shared_target
is intended for shared storage type situations where the target is not a compute node. sort
- Specify one or more columns that will be sorted within each segment.
sortseg
- Specify one or more columns that will be used to sort and segment the new table.
The table will automatically be marked as sortseg.
sortseg_advise
Segmentation by one column may imply segmentation by another column. For example, if you have grocery store data and segment by the
state
column, it implies that it is also segmented bystore_id
(because the store can exist in only one state). If bothstate
andstore_id
are sortseg, you can then includestate
andstore_id
insortseg_advise
.sortseg_advise
contains a list of lists indicating all sortseg groupings.The lists of columns are separated by semicolons. The columns are separated by commas.
The default is the
sortseg
grouping.Note:sortseg_advise
only works if thesortseg
attribute is specified.splitseg
- An integer threshold used to split up large segments. For use with a segmentation
expression.
In any target segment, if the resulting row count exceeds the
splitseg
threshold, it will be split in half until all the pieces are less than the threshold.Prior to version 10.12, the
segby
attribute was required bysplitseg
and was maintained during the split.As of version 10.12,
splitseg
no longer requiressegby
. Ifsegby
is not specified, the segment is split in the middle until the threshold is reached.The default is no threshold.
store_type
- Allows you to materialize an existing table and move it to a different storage tier
when
replace="1"
. This is done explicitly by indicatingstore_type="ssd"
for a nvm table, orstore_type="nvm"
for a ssd table. Ifstore_type
is omitted,nvm
is the default storage tier.You cannot change storage tiers with
append="1"
. stripe
- Specifies the degree of redundancy for the table.
Each segment will be written to the specified number of the target servers, in a diagonally-striped fashion.
If
stripe
is omitted, materialize selects a stripe based on the default value for your ID/Account. stripe_factor
- Stripe factors are for environments that can be scaled up or down. It is a way for a
user to not have to worry about a table's stripe in environments that easily scale with
the number of machines.
<materialize>
sets a stripe factor when it is explicitly stated. There is no harm settingstripe_factor
in environments that don't scale. If the environment does not scale, the stripe factor is ignored.For example, a
stripe_factor
of 0.2 results in a stripe of 2 when there are 10 upload machines, or a stripe of 3 when there are 11 upload machines. Ifstripe
is 2 whilestripe_factor
is empty, the result is a stripe of 2 for any number of upload machines greater than 1.An error is reported before any computation is performed in
<materialize>
ifstripe_factor
is either greater than 1.0 or less than or equal to 0.0.(Available as of version 18.05)
stripemethod
- The method of shuffling the machines among stripes.
The default value is
mirror
, which results in traversing the machine list in order, repeating when finished. targetdir
- Specifies the target file system directory to write table data to. Note: This is advanced functionality that requires special privileges.
temp
- Specifies whether the metadata should be committed only to the current session.
The default is 0 (i.e., the metadata persists beyond the current session).
title
- Specifies the title of the new table.
users
- A comma-separated list of the users who will have access to the new table.
If absent, privileges are inherited from the parent folder. If
users=""
is specified, the new table is private (i.e., accessible only to the user).
Example: Change the segmentation of a table
In this example, the <materialize>
operation will start with the
Sales Item Detail table found at
pub.doc.retail.salesdetail
and save a new version of the table with a
different segmentation. In this case, the new table will be segmented by the
sku
column.
<base table="pub.doc.retail.salesdetail"/> <materialize path="pub.doc.retail.altseg.sales_detail_sku" segby="sku" sort="date" title="Sales Item Detail Segmented by SKU"/>
Example: Advanced segment assignment
The following example uses the adv_machs
, adv_type
, and
adv_stripe
attributes to put each segment on 5 machines: two from the
a
group, and all three from the b
group.
<materialize adv_machs="a0,a1,a2;b0,b1,b2" adv_type="group" adv_stripe="2,3"/>
Example: Segby advise
segby_advise
to show how two
columns are related in a segby. If a table of grocery stores in segby
state
, you know that it is also segby store_id
, because a
given store can only exist in one
state.<materialize path="some.path" segby="state" segby_advise="state;store_id"/>