<link>
Link a foreign table to a base table or worksheet. Both tables must share a column on which to link.
Description
<link>
defines a link between the current table and another table (i.e., the
foreign table) such that, by default, all columns of the foreign table are
logically added to the current table.
<link>
tag are applied to the foreign
table prior to effecting the link, so that the current table is linked to the foreign
table as transformed by those operations. The operations are applied to the foreign
table in the order in which they appear.<tabu>
) are applied to the results of the tabulation, not the
original table.When linking tables in the 1010data Insights Platform, the data type of the column in the base table must match the type of the column being linked from the foreign table.
For an overview of linking, see Linking tables and worksheets in the 1010data Insights Platform User's Guide.
Size limitations for <link>
When linking to a table, there are no limitations on the size of the base table, the
foreign table, or the results of the <link>
operation. However, when you
link to a worksheet (a table with operations applied to it), the full worksheet is pulled
into memory, so there is a limit on how large the worksheet can be, which is roughly
2GB.
A simple formula for determining the approximate memory footprint of a worksheet is:
(R*C*8)/1B
R
is the number of rows in the worksheetC
is the number of columns in the worksheet8
is the number of bytes for each value in the worksheet1B
is one billion (the number of bytes in a gigabyte)
In general, the worksheet you are linking to should not exceed ~250MM cells (i.e.,
R*C
< 250MM).
Syntax
<link table2="[FOREIGN_TABLE_NAME]" col="[LINK_COL_1,LINK_COL_2,...LINK_COL_N]" col2="[FOREIGN_COL_1,FOREIGN_COL_2,...FOREIGN_COL_N]" label="[LABEL]" suffix="[SUFFIX]" type="[exact|select|asof|include|exclude|side]" shift="[NUMBER_OF_ROWS]" after="[0|1]" keepcols="[0|1]" cols="[COLUMN_NAME_1,COLUMN_NAME_2,...COLUMN_NAME_N]" expand="[0|1]" nocache="[0|1]" segby="[COLUMN_NAME]" materialize="[0|1]" denormalize="[0|1]" basenames="[0|1]" filename_suffix="[FILENAME_SUFFIX]" machs="[MACHINE_NAME_1,MACHINE_NAME_2,...MACHINE_NAME_N]" compression="[none|fast|faster|fastest|small|smaller|smallest]" job_responsibility="[0|1]"> [1010data_QUERY] </link>
If a [1010data_QUERY] is provided, the operations in the query are
applied to the table specified by the table2
attribute before the link is
performed.
Attributes
table2
- The name or ID of the foreign table. (required)Note: The name of a table is not the same as its title.
To find the name or ID of a table, view the table and choose
on the menu.If
table2="*"
, the state of the foreign table consists of the base table with all the operations up to the<link>
applied to it. col
- A column name or a comma-separated list of the names of the columns in the current
table on which to link. (required)
col
may contain an expression. See Example: Expressions incol
orcol2
for an example. (Available as of version 12.35)col
is not required iftype="side"
. col2
- A column name or a comma-separated list of the names of the columns in the foreign
table on which to link.
col2
must have the same number of entries ascol
and, if omitted, is assumed to be the same ascol
.col2
may contain an expression. See Example: Expressions incol
orcol2
for an example. (Available as of version 12.35) label
- The label to append to the headers of columns in the foreign table.
The default is none.
suffix
- The suffix to append to the names of columns in the foreign table.
The default is none.
type
- The type of link. The default is
exact
.Valid Values for
type
:exact
- A standard link where a row in the current table is matched to the (first) row
in the foreign table that has exactly the same values in the link columns
(
col
andcol2
). select
- Perform a link and select.
asof
- Perform an as-of link on a time/date column.
As-of links allow you to match rows in a less strict way when two tables (or worksheets) are linked together.
For a particular row in the current table, if no exact match can be made in the foreign table based on the values in the as-of link column, the row in the foreign table that has the closest value to the row in the current table is linked in. The
When linking on multiple columns, the as-of link is performed on the last column listed in theafter
attribute determines whether to link in the row in the foreign table with the closest value greater than or less than the value in the current table.col
andcol2
attributes. The other link columns are matched exactly.Note: To perform an as-of link, it is essential that the last column of the foreign table be in ascending order within the remaining link columns. For example, if you link two tables on symbol, date, and time, then all values of time for a given symbol and date must appear in ascending order. You may also sort the entire table in ascending order by time.In the user interface of the 1010data Insights Platform, as-of links are referred to as Link closest match before and Link closest match after. For more information, see Link types in the 1010data Insights Platform User's Guide.
include
- Works like
select
except that no columns from the foreign table are adjoined; only the selection on matching rows is done. exclude
- The inverse of
include
. No columns from the foreign table are adjoined, and only rows that do not match any rows in the foreign table are selected. side
- A special type of link that "lines up" two tables with exactly the same number
of rows and same segmentation; no linking columns are used in this case.Note: You cannot perform a side link on a worksheet.
shift
- Tells the
<link>
operation to shift n rows.Note: This is only used for as-of links.The default is 0.
after
- When
type="asof"
, theafter
attribute specifies whether to find a match from the foreign table whose value is either greater than or less than the value from the current table.When
after="1"
, the as-of link will match a row in the current table with a row in a foreign table whose value is the closest match less than or equal to the value in the current table. See Example: As-of links for an example.When
after="0"
, the as-of link will match a row in the current table with a row in a foreign table whose value is the closest match greater than or equal to the value in the current table. See Example: Reverse as-of links for an example.The default value is 1.
keepcols
- Whether to preserve (
keepcols="1"
) or remove (keepcols="0"
) the linking column(s) specified bycol2
in the foreign table after the link.The default is 0 (i.e., to remove the column(s), except that, for
type="asof"
, the time column is always kept since it need not be identical to the matching column in the current table). cols
- The name(s) of the column(s) in the foreign table to adjoin to the current
table. Accepts a comma-separated list.
The default, if
cols
is omitted, is to adjoin all columns from the foreign table except the linking columns specified bycol2
(which may be preserved withkeepcols
). excols
- Accepts a comma-separated list of columns in the foreign table that should be excluded
from the
<link>
operation.(Available as of prod-9)
expand
- Determines whether to expand this table to accommodate further matches in the foreign
table after the first matching row (i.e.,
expand="1"
produces a partial Cartesian product with all possible matches from the foreign table, according to the linking criteria, appearing in the table).With
expand="1"
each row in the current table will be duplicated if necessary so that it appears as many times as there are matches to that row in the foreign table, with the first match adjoined to the first row, the second to the second, and so on. Withexpand="0"
, only the first match, if any, is adjoined, and rows in the current table are never duplicated.Note: Use this feature with caution. While it can be very useful, performing this operation on very large data sets can be very resource intensive for the system and can cause your query/analysis to run slower. fill_i (fill_f, fill_j, fill_a)
- Specifies the integer, float, bigint, and string values, respectively, to use instead
of
NA
for foreign-table fields of the respective types at unmatched rows. Each attribute, if present, must be a valid literal of the correct type. Without this attribute, the default fill value for unmatched rows is the appropriateNA
for the data type. nocache
- (Advanced option) Specifies whether to cache this link.
Generally, the system handles link caching. In some cases, specifically when linking a very small foreign table into a very large current table and using only one column from the foreign table, it may save memory to avoid the caching with
nocache="1"
. This option will never change the result but may reduce query speed dramatically if used indiscriminately.The default is 0.
segby
- (Advanced option) Specify specially sorted and segmented columns to be used to
speed up the link between two large tables.
Ordinarily, this option is unnecessary as the system will locate an appropriate segby column if the information was specified at load time.
Note: Using this option with an inappropriate column can lead to incorrect results. materialize
- Indicates whether the link should be materialized as a prelink in the table. Note: You must own the table to materialize a prelink in it.
When using
materialize="1"
withoutdenormalize="1"
,<link>
must appear before any ops that change the selection or sort order of the table, and the foreign table and link column(s) must be physical. Cannot be an as-of link.The default is 0 (i.e., do not materialize).
Attributes when materialize="1"
The following attributes are only valid when materialize="1"
:
denormalize
- Indicates whether the columns being linked should be materialized into the current
table (respecting the
cols
attribute).When
denormalize="1"
, the link will not be prelinked.The default is 0.
basenames
- When
denormalize="1"
, this specifies whether to use the base column names as file names for denormalized data or to always use unique file names.The default is 0 (i.e., always use unique file names).
Note: This is advanced functionality that requires special privileges. filename_suffix
- When
denormalize="1"
, this suffix is appended to the base column names to generate file names for denormalized data.This allows you to get the same file names on two different link denormalizations without using
basenames="1"
.Note: This is advanced functionality that requires special privileges. machs
- A comma-separated list of machines to write the prelink to.
Every segment of the base table must exist on one of the machines listed in
machs
. In addition, machines that do not contain segments of the base table may not be included inmachs
.machs
cannot be used withdenormalize="1"
.Note: This is advanced functionality that requires special privileges. compression
- When
denormalize="1"
, this specifies the type of compression to use.Possible values are:
none
fast
faster
fastest
small
smaller
smallest
The default is
faster
.Note: This is advanced functionality that requires special privileges. job_responsibility
- Setting
job_responsibility="1"
prevents synchronization of the table being materialized on an environment configured for High Availability (HA). This is assuming a job on the other HA sites will also be performing the same materialize, so synchronization will be unnecessary. Does nothing in an environment not configured for HA.The default is 0.
(Available as of version 9.36)
Example
In this example, assume the "current" table is pub.demo.retail.item. This table contains transactional sales data and will be linked to the product master table, which contains additional information about every item code in the Sales Item Detail table.
<base table="pub.demo.retail.item"/> <link table2="pub.demo.retail.prod" col="sku" col2="sku" label="Product Master" suffix="_pm"/>
Example: Expressions in col
or
col2
In this example, the location
column in
the base table contains values in mixed case (e.g., Manhattan). The
location
column in the foreign table contains values in all upper case
(e.g., MANHATTAN). Previously, the Insights Platform could not link these two columns
because they would be evaluated as containing different values. Now, the
col
and col2
attributes accept expressions, such as
lowercase(X)
.
The example uses lowercase(location)
for both col
and col2
, which changes the values in both
columns to all lower case and allows the Insights Platform to link these them.
<table cols="store,location">1,Manhattan;4,San Francisco;10,Detroit</table> <link table2="default.lonely" col="lowercase(location)" col2="lowercase(location)" shift="0" expand="1"> <table cols="location,manager">MANHATTAN,JJONES; CHICAGO,SSTRANGE; SAN FRANCISCO,VMARS; DETROIT,MWALKER</table> </link>
Example: As-of links
The following example performs an as-of link on the tables pub.doc.samples.as_of_links.sales and pub.doc.samples.as_of_links.advertisements to answer the question “What was the ad that preceded this purchase?”
type="asof"
in the
<link>
operation.<base table="pub.doc.samples.as_of_links.sales"/> <link table2="pub.doc.samples.as_of_links.advertisements" label="As-of Links" col="product,date" type="asof"/>
The base table, pub.doc.samples.as_of_links.sales, contains transactional sales information about products purchased over a range of dates.
The foreign table, pub.doc.samples.as_of_links.advertisements, contains information related to product ads that were run on certain dates via different mediums.
For those rows in the base table that have a matching value in the product
column in the foreign table, the as-of link finds the row in the foreign table that has the
closest value less than or equal to the value in the date
column in
the base table. (Since the after
attribute is not specified, this is the
default behavior.)
The resultant table, after the as-of link is performed, shows the date and the medium associated with the product ad whose date preceded each transaction.
For example, the ad related to the tie
product that most closely preceded
the purchase on 07/18/00 was a
television ad on 07/17/00. The
ad related to the tie
product that most closely preceded the purchase on
07/16/00 was a newspaper ad on
07/01/00. If no product ad was run for a particular product
(e.g., socks
), the values for the date and medium are N/A.
Example: Reverse as-of links
The following example performs a reverse as-of link on the tables pub.doc.samples.as_of_links.advertisements and pub.doc.samples.as_of_links.sales to answer the question “What is the first purchase that followed this ad?” This type of link could be used for loyalty marketing, especially with email or promotions on a customer level. In other words, it shows how long it took a customer to respond to a promotion.
type="asof"
and
after="0"
in the <link>
operation.<base table="pub.doc.samples.as_of_links.advertisements"/> <link table2="pub.doc.samples.as_of_links.sales" col="product,date" type="asof" after="0" suffix="_trans" label="Trans"/> <willbe name="days_between_ad_trans" value="days(date;date_trans)" label="Days Between`Ad and`First Purchase"/> <col name="cust_id_trans" label="Customer Who`First Bought`After Ad"/> <col name="qty_trans" label="Qty Bought`First Trans`After"/>
The base table, pub.doc.samples.as_of_links.advertisements, contains information related to product ads that were run on certain dates via different mediums.
The foreign table, pub.doc.samples.as_of_links.sales, contains transactional sales information about products purchased over a range of dates.
For those rows in the base table that have a matching value in the product
column in the foreign table, the as-of link finds the row in the foreign table that has the
closest value greater than or equal to the value in the date
column
in the base table. (This behavior is the result of specifying after="0"
in
the <link>
operation.)
The resultant table, after the as-of link is performed, shows the date of the first transaction that occurred after each product ad along with the associated customer ID and the quantity of the product that they purchased. In addition, the days between the date of the ad and the transaction is calculated.