Segby links

Linking on a column that your foreign table is both sorted and segmented by (i.e., sortseg), can make your <link> more efficient.

Key concepts

  • If your foreign table is sortseg, you can take advantage of segby links.
  • The system will recognize if the foreign table is sortseg on the link column.

Discussion

When you link two tables together, the system searches the foreign table to find a matching value for each row in the base table. This means that you are scanning the entire foreign table multiple times per <link> in order to find matches for every row in the base table. If your tables are large, it can make the process even longer, and in some cases the <link> might not complete.

If your foreign table is sorted and segmented (i.e., sortseg) on a particular column, it ensures that the values within each segment in that column are sorted. It also ensures that ranges of values within that column cannot overlap between segments. However, it doesn't ensure that the table is globally sorted.

For example, if you have a transaction table that is sortseg on transaction ID, you could have three segments with ranges from 1 to 20, 21 to 40, and 41 to 60, in that order.

You could also have the same ranges in a different order.

However, your ranges cannot be 1 to 25, 20 to 40, and 41 to 60, because of the overlapping values.

If your table is sortseg by transaction ID, it ensures that any given ID will only be present in one segment. In the example above, transaction ID 22 could be found in two different segments, which makes it an invalid sortseg.

When your table is sortseg, performing a <link> where the segby attribute denotes the sortseg column tells the system to search only one segment for a match instead of the whole table.

Normally, if a table is sortseg on the column that is being linked to, the system will know to only search the correct segment for a match to any given value without specifying the segby attribute.

While only your foreign table needs to be sortseg in order to take advantage of segby linking, further efficiency can be achieved if the base table is also sortseg on the linking column. In this situation, each segment in the base table will match up to only segment in the foreign table.

Example

Suppose you want to link together transaction and weather tables in order to analyze the impact of the weather on customers' purchases. You want to analyze this behavior over a long period of time, so you don't limit your data with any time period selections. First, you need to link your transaction table to the Store Master (pub.doc.retail.altseg.stores) in order to get location information.

<base table="pub.doc.retail.altseg.sales_detail_transid"/>
<link table2="pub.doc.retail.altseg.stores" col="store" col2="store"/>

The foreign table, Store Master, is not segmented by store. In fact, since there are only 10 rows in the table, everything is in one segment. Thus, the <link> is not and cannot be an optimized segby link.

Next, you need to link to the Observed Daily table (pub.demo.weather.wunderground.observed_daily) in order to get the weather information for each day. You need to link based on date, but you also need to link based on location, because each date in each table corresponds to multiple locations.

<willbe name="zipcode" value="string(zip)"/>
<link table2="pub.demo.weather.wunderground.observed_daily" 
col="zipcode,trans_date" col2="zipcode,date"/>

In this <link>, a segby link occurs. Observed Daily is sortseg on zipcode, and the system will recognize that one of the linking columns is indeed what the table is segmented by.