<link>

Link a foreign table to a base table or worksheet. Both tables must share a column on which to link.

Description

<link> defines a link between the current table and another table (i.e., the foreign table) such that, by default, all columns of the foreign table are logically added to the current table.

The tags contained within a <link> tag are applied to the  foreign table  prior to effecting the link, so that the current table is linked to the foreign table as transformed by those operations. The operations are applied to the foreign table in the order in which they appear.
Note: All tags following a tabulation (<tabu>) are applied to the results of the tabulation, not the original table.

When linking tables in the 1010data Insights Platform, the data type of the column in the base table must match the type of the column being linked from the foreign table.

For an overview of linking, see Linking tables and worksheets in the 1010data Insights Platform User's Guide.

Size limitations for <link>

When linking to a table, there are no limitations on the size of the base table, the foreign table, or the results of the <link> operation. However, when you link to a worksheet (a table with operations applied to it), the full worksheet is pulled into memory, so there is a limit on how large the worksheet can be, which is roughly 2GB.

A simple formula for determining the approximate memory footprint of a worksheet is:

(R*C*8)/1B

where:
  • R is the number of rows in the worksheet
  • C is the number of columns in the worksheet
  • 8 is the number of bytes for each value in the worksheet
  • 1B is one billion (the number of bytes in a gigabyte)

In general, the worksheet you are linking to should not exceed ~250MM cells (i.e., R*C < 250MM).

Syntax

<link table2="[FOREIGN_TABLE_NAME]"
      col="[LINK_COL_1,LINK_COL_2,...LINK_COL_N]"
      col2="[FOREIGN_COL_1,FOREIGN_COL_2,...FOREIGN_COL_N]"
      label="[LABEL]" 
      suffix="[SUFFIX]"
      type="[exact|select|asof|include|exclude|side]"
      shift="[NUMBER_OF_ROWS]" 
      after="[0|1]" 
      keepcols="[0|1]"
      cols="[COLUMN_NAME_1,COLUMN_NAME_2,...COLUMN_NAME_N]"
      expand="[0|1]" 
      nocache="[0|1]" 
      segby="[COLUMN_NAME]"
      materialize="[0|1]" 
      denormalize="[0|1]" 
      basenames="[0|1]"
      filename_suffix="[FILENAME_SUFFIX]"
      machs="[MACHINE_NAME_1,MACHINE_NAME_2,...MACHINE_NAME_N]"
      compression="[none|fast|faster|fastest|small|smaller|smallest]"
      job_responsibility="[0|1]">
        [1010data_QUERY]
</link>

If a [1010data_QUERY] is provided, the operations in the query are applied to the table specified by the table2 attribute before the link is performed.

Attributes

table2
The name or ID of the foreign table. (required)
Note: The name of a table is not the same as its title.

To find the name or ID of a table, view the table and choose Help > About this Table on the menu.

If table2="*", the state of the foreign table consists of the base table with all the operations up to the <link> applied to it.

col
A column name or a comma-separated list of the names of the columns in the current table on which to link. (required)

col may contain an expression. See Example: Expressions in col or col2 for an example. (Available as of version 12.35)

col is not required if type="side".

col2
A column name or a comma-separated list of the names of the columns in the foreign table on which to link.

col2 must have the same number of entries as col and, if omitted, is assumed to be the same as col.

col2 may contain an expression. See Example: Expressions in col or col2 for an example. (Available as of version 12.35)

label
The label to append to the headers of columns in the foreign table.

The default is none.

suffix
The suffix to append to the names of columns in the foreign table.

The default is none.

type
The type of link. The default is exact.

Valid Values for type:

exact
A standard link where a row in the current table is matched to the (first) row in the foreign table that has exactly the same values in the link columns (col and col2).
select
Perform a link and select.
asof
Perform an as-of link on a time/date column.

As-of links allow you to match rows in a less strict way when two tables (or worksheets) are linked together.

For a particular row in the current table, if no exact match can be made in the foreign table based on the values in the as-of link column, the row in the foreign table that has the closest value to the row in the current table is linked in. The after attribute determines whether to link in the row in the foreign table with the closest value greater than or less than the value in the current table.

When linking on multiple columns, the as-of link is performed on the last column listed in the col and col2 attributes. The other link columns are matched exactly.
Note: To perform an as-of link, it is essential that the last column of the foreign table be in ascending order within the remaining link columns. For example, if you link two tables on symbol, date, and time, then all values of time for a given symbol and date must appear in ascending order. You may also sort the entire table in ascending order by time.

In the user interface of the 1010data Insights Platform, as-of links are referred to as Link closest match before and Link closest match after. For more information, see Link types in the 1010data Insights Platform User's Guide.

include
Works like select except that no columns from the foreign table are adjoined; only the selection on matching rows is done.
exclude
The inverse of include. No columns from the foreign table are adjoined, and only rows that do not match any rows in the foreign table are selected.
side
A special type of link that "lines up" two tables with exactly the same number of rows and same segmentation; no linking columns are used in this case.
Note: You cannot perform a side link on a worksheet.
shift
Tells the <link> operation to shift n rows.
Note: This is only used for as-of links.

The default is 0.

after
When type="asof", the after attribute specifies whether to find a match from the foreign table whose value is either greater than or less than the value from the current table.

When after="1", the as-of link will match a row in the current table with a row in a foreign table whose value is the closest match less than or equal to the value in the current table. See Example: As-of links for an example.

When after="0", the as-of link will match a row in the current table with a row in a foreign table whose value is the closest match greater than or equal to the value in the current table. See Example: Reverse as-of links for an example.

The default value is 1.

keepcols
Whether to preserve (keepcols="1") or remove (keepcols="0") the linking column(s) specified by col2 in the foreign table after the link. 

The default is 0 (i.e., to remove the column(s), except that, for type="asof", the time column is always kept since it need not be identical to the matching column in the current table).

cols
The name(s) of the column(s) in the foreign table to adjoin to the current table. Accepts a comma-separated list.

The default, if cols is omitted, is to adjoin all columns from the foreign table except the linking columns specified by col2 (which may be preserved with keepcols).

excols
Accepts a comma-separated list of columns in the foreign table that should be excluded from the <link> operation.

(Available as of prod-9)

expand
Determines whether to expand this table to accommodate further matches in the foreign table after the first matching row (i.e., expand="1" produces a partial Cartesian product with all possible matches from the foreign table, according to the linking criteria, appearing in the table).

With expand="1" each row in the current table will be duplicated if necessary so that it appears as many times as there are matches to that row in the foreign table, with the first match adjoined to the first row, the second to the second, and so on. With expand="0", only the first match, if any, is adjoined, and rows in the current table are never duplicated.

Note: Use this feature with caution. While it can be very useful, performing this operation on very large data sets can be very resource intensive for the system and can cause your query/analysis to run slower.
fill_i (fill_f, fill_j, fill_a)
Specifies the integer, float, bigint, and string values, respectively, to use instead of NA for foreign-table fields of the respective types at unmatched rows. Each attribute, if present, must be a valid literal of the correct type. Without this attribute, the default fill value for unmatched rows is the appropriate NA for the data type.
Examples:
  • <link fill_i="0"/> will use "0" instead of NA for unmatched rows in a link of fields of type integer.
  • <link fill_f="0.0"/> will use "0.0" instead of NA for unmatched rows in a link of fields of type float.
  • <link fill_j="0"/> will use "0" instead of NA for unmatched rows in a link of fields of type bigint.
  • <link fill_a="not applicable"/> will use "not applicable" instead of NA for unmatched rows in a link of fields of type string.
nocache
(Advanced option) Specifies whether to cache this link. 

Generally, the system handles link caching. In some cases, specifically when linking a very small foreign table into a very large current table and using only one column from the foreign table, it may save memory to avoid the caching with nocache="1". This option will never change the result but may reduce query speed dramatically if used indiscriminately. 

The default is 0.

segby
(Advanced option) Specify specially sorted and segmented columns to be used to speed up the link between two large tables.

Ordinarily, this option is unnecessary as the system will locate an appropriate segby column if the information was specified at load time.

Note: Using this option with an inappropriate column can lead to incorrect results.
materialize
Indicates whether the link should be materialized as a prelink in the table.
Note: You must own the table to materialize a prelink in it.

When using materialize="1" without denormalize="1", <link> must appear before any ops that change the selection or sort order of the table, and the foreign table and link column(s) must be physical. Cannot be an as-of link.

The default is 0 (i.e., do not materialize).

Attributes when materialize="1"

The following attributes are only valid when materialize="1":

denormalize
Indicates whether the columns being linked should be materialized into the current table (respecting the cols attribute).

When denormalize="1", the link will not be prelinked.

The default is 0.

basenames
When denormalize="1", this specifies whether to use the base column names as file names for denormalized data or to always use unique file names.

The default is 0 (i.e., always use unique file names).

Note: This is advanced functionality that requires special privileges.
filename_suffix
When denormalize="1", this suffix is appended to the base column names to generate file names for denormalized data.

This allows you to get the same file names on two different link denormalizations without using basenames="1".

Note: This is advanced functionality that requires special privileges.
machs
A comma-separated list of machines to write the prelink to.

Every segment of the base table must exist on one of the machines listed in machs. In addition, machines that do not contain segments of the base table may not be included in machs.

machs cannot be used with denormalize="1".

Note: This is advanced functionality that requires special privileges.
compression
When denormalize="1", this specifies the type of compression to use.

Possible values are:

  • none
  • fast
  • faster
  • fastest
  • small
  • smaller
  • smallest

The default is faster.

Note: This is advanced functionality that requires special privileges.
job_responsibility
Setting job_responsibility="1" prevents synchronization of the table being materialized on an environment configured for High Availability (HA). This is assuming a job on the other HA sites will also be performing the same materialize, so synchronization will be unnecessary. Does nothing in an environment not configured for HA.

The default is 0.

(Available as of version 9.36)

Example

In this example, assume the "current" table is pub.demo.retail.item. This table contains transactional sales data and will be linked to the product master table, which contains additional information about every item code in the Sales Item Detail table.

<base table="pub.demo.retail.item"/>
<link table2="pub.demo.retail.prod" col="sku" col2="sku" 
 label="Product Master" suffix="_pm"/>

Example: Expressions in col or col2

In this example, the location column in the base table contains values in mixed case (e.g., Manhattan). The location column in the foreign table contains values in all upper case (e.g., MANHATTAN). Previously, the Insights Platform could not link these two columns because they would be evaluated as containing different values. Now, the col and col2 attributes accept expressions, such as lowercase(X).

The example uses lowercase(location) for both col and col2, which changes the values in both columns to all lower case and allows the Insights Platform to link these them.

<table cols="store,location">1,Manhattan;4,San Francisco;10,Detroit</table>
<link table2="default.lonely" col="lowercase(location)" col2="lowercase(location)" shift="0" expand="1">
<table cols="location,manager">MANHATTAN,JJONES; CHICAGO,SSTRANGE; SAN FRANCISCO,VMARS; DETROIT,MWALKER</table>
</link>
(Available as of version 12.35)

Example: As-of links

The following example performs an as-of link on the tables pub.doc.samples.as_of_links.sales and pub.doc.samples.as_of_links.advertisements to answer the question “What was the ad that preceded this purchase?”

Note: An as-of link is accomplished by setting type="asof" in the <link> operation.
<base table="pub.doc.samples.as_of_links.sales"/>
<link table2="pub.doc.samples.as_of_links.advertisements" 
 label="As-of Links" col="product,date" type="asof"/>

The base table, pub.doc.samples.as_of_links.sales, contains transactional sales information about products purchased over a range of dates.

The foreign table, pub.doc.samples.as_of_links.advertisements, contains information related to product ads that were run on certain dates via different mediums.

For those rows in the base table that have a matching value in the product column in the foreign table, the as-of link finds the row in the foreign table that has the closest value less than or equal to the value in the date column in the base table. (Since the after attribute is not specified, this is the default behavior.)

The resultant table, after the as-of link is performed, shows the date and the medium associated with the product ad whose date preceded each transaction.

For example, the ad related to the tie product that most closely preceded the purchase on 07/18/00 was a television ad on 07/17/00. The ad related to the tie product that most closely preceded the purchase on 07/16/00 was a newspaper ad on 07/01/00. If no product ad was run for a particular product (e.g., socks), the values for the date and medium are N/A.

Example: Reverse as-of links

The following example performs a reverse as-of link on the tables pub.doc.samples.as_of_links.advertisements and pub.doc.samples.as_of_links.sales to answer the question “What is the first purchase that followed this ad?” This type of link could be used for loyalty marketing, especially with email or promotions on a customer level. In other words, it shows how long it took a customer to respond to a promotion.

Note: A reverse as-of link is accomplished by setting type="asof" and after="0" in the <link> operation.
<base table="pub.doc.samples.as_of_links.advertisements"/>
<link table2="pub.doc.samples.as_of_links.sales" col="product,date" 
 type="asof" after="0" suffix="_trans" label="Trans"/>
<willbe name="days_between_ad_trans" value="days(date;date_trans)" 
 label="Days Between`Ad and`First Purchase"/>
<col name="cust_id_trans" label="Customer Who`First Bought`After Ad"/>
<col name="qty_trans" label="Qty Bought`First Trans`After"/>

The base table, pub.doc.samples.as_of_links.advertisements, contains information related to product ads that were run on certain dates via different mediums.

The foreign table, pub.doc.samples.as_of_links.sales, contains transactional sales information about products purchased over a range of dates.

For those rows in the base table that have a matching value in the product column in the foreign table, the as-of link finds the row in the foreign table that has the closest value greater than or equal to the value in the date column in the base table. (This behavior is the result of specifying after="0" in the <link> operation.)

The resultant table, after the as-of link is performed, shows the date of the first transaction that occurred after each product ad along with the associated customer ID and the quantity of the product that they purchased. In addition, the days between the date of the ad and the transaction is calculated.