Tables vs. worksheets

When you perform one or more operations on a table, it becomes a worksheet. It is important to understand the differences and implications of linking to a worksheet rather than a table.

Key concepts

  • A <link> containing no additional operations is a link to a table.
  • A <link> containing additional operations is a link to a worksheet.
  • In most cases, it is best to link to a table instead of a worksheet.

Discussion

A <link> containing no operations is a link to a table. Placing any operations within the opening and closing tags of a <link> creates a link to a worksheet.

If possible, it is normally best to link to a table rather than to a worksheet. When you link to a table, the evaluation is spread over multiple processes according to the segmentation of the table. For example, if your table has ten segments, ten processes will work on evaluating the <link>, one for each segment. However, if you perform a worksheet link, although the table might still have ten segments, it is evaluated all in memory, so there is only one process working on the entire evaluation of the <link>. Therefore, linking to a table rather than a worksheet will most likely be faster, and it will also use less memory.

There are, however, exceptions to this general guideline. If your foreign table is particularly large (i.e., on the order of tens of millions of rows or larger) and the operations performed on the table reduce it to a worksheet containing less than a million rows, linking to the worksheet would be optimal. Although you are performing the link in memory, there is significantly less data to search through in the foreign table in order to find a match for the base table.

Example

You have a table containing transaction data, filtered by date, and you want to only retain lines from the produce department. The transaction table doesn't contain the department description and you don't recall which department number is designated to produce. Therefore, you perform a link of type include to a worksheet where the Product Master (pub.doc.retail.altseg.sales_detail_sku) is the base table and a selection is made on the department description.

<base table="pub.doc.retail.altseg.sales_detail_sku"/>
<sel value="between(trans_date;20160801;20160831)"/>
<link table2="pub.doc.retail.altseg.products" col="dept" col2="dept"
type="include">
  <sel value="deptdesc='PRODUCE'"/>
</link>

Alternatively, you could link to the Product Master table and make a selection on the department description after the <link>.

<base table="pub.doc.retail.altseg.sales_detail_sku"/>
<sel value="between(trans_date;20160801;20160831)"/>
<link table2="pub.doc.retail.altseg.products" col="dept" col2="dept"/>
<sel value="deptdesc='PRODUCE'"/>

Both queries will arrive at the same answer, but in this case the second query, where you link to a table instead of a worksheet, has better performance.