g_tshift_nearest(G;S;T;TS;X;N)

Returns the value of the nearest row that is at least a specified time period before (or after) the current row within a given group.

Function type

Vector only

Syntax

g_tshift_nearest(G;S;T;TS;X;N)
t_tshift_nearest(X;N)

Input

Argument Type Description
G any A space- or comma-separated list of column names

Rows are in the same group if their values for all of the columns listed in G are the same.

If G is omitted, all rows are considered to be in the same group.

If any of the columns listed in G contain N/A, the N/A value is considered a valid grouping value.

S integer The name of a column in which every row evaluates to a 1 or 0, which determines whether or not that row is selected to be included in the calculation

If S is omitted, all rows will be considered by the function (subject to any prior row selections).

If any of the values in S are neither 1 nor 0, an error is returned.

T integer or decimal The name of a column whose values are used as the basis of the time series

Row order is determined by T.

Note: T may not be omitted.
TS text A string representing the type of the time T

The choice of TS must match the type of values in T, or the results will be meaningless.

Valid types are:
  • 'D' for the date (YYYYMMDD) type (i.e., one period = one day)
  • 'Q' for the quarter (YYYYQ) type (i.e., one period = one quarter)
  • 'M' for the month (YYYYMM) type (i.e., one period = one month)
  • 'T' for the time (HHMMSS) type (i.e., one period = one second)
  • 'TS' for the date+time type (i.e., one period = one second)
  • 'TSn' for the date+time type construed with n decimal places of precision in the seconds place (i.e., one period = 1/10^n seconds)

If TS is omitted, then T is assumed to be an integer (e.g., a year or a period number, such that one period=1).

X any A column name
N integer The number of rows to shift within the group

N can be an expression that evaluates to an integer value.

Return Value

For every row in each group defined by G and ordered by T (and for those rows where S=1, if specified):
  • If N >= 0:
    • g_tshift_nearest returns the value of X for the row that is exactly N time periods after this row in the same group as this row.

      If there are multiple rows that are exactly N time periods after this row, g_tshift_nearest returns the value from the first row within that set of rows.
      Note: This behavior is identical to g_tshift.
    • If there are no rows that are exactly N time periods after this row, g_tshift_nearest returns the value of X for the nearest row that is at least N time periods after this row.

  • If N < 0:
    • g_tshift_nearest returns the value of X for the row that is exactly |N| time periods before this row in the same group as this row.

      If there are multiple rows that are exactly |N| time periods before this row, g_tshift_nearest returns the value from the first row within that set of rows.

      Note: This behavior is identical to g_tshift.
    • If there are no rows that are exactly |N| time periods before this row, g_tshift_nearest returns the value of X for the nearest row that is at least |N| time periods before this row.

Time periods are determined by TS with respect to the values in T.

The result is the same data type as X.

Note: N/As in X are treated like any other value; i.e. they are shifted.

If no rows in a group have valid (non-N/A) values for X, the result for every row of the group is N/A.

Sample Usage

<base table="pub.doc.samples.ref.func.g_func_time_series_sample_usage"/>
<willbe name="g_tshift_nearest_1" value="g_tshift_nearest(state;include;order;;value;-1)"/>
<willbe name="g_tshift_nearest_2" value="g_tshift_nearest(state city;include;order;;value;-1)"/>

Example

For this example, we'll use the sales data available in pub.demo.retail.item, which contains sales transactions from three stores.

Let's say we want to find out the total sales from the previous day. However, for the sake of example, let's suppose our store isn't open every day, so there will be gaps in our transactional data. We can't just use a function like g_tshift, because there will not always be an exact match for the previous day. What we really want is to find the total sales from the day that is nearest to the previous day. To do this, we can use g_tshift_nearest.

For simplicity, let's say we just want to look at the transactions from store 1. We'll do a simple <sel> operation to pare down the data.

<base table="pub.demo.retail.item"/>
<sel value="(store=1)"/>

(Obviously, this is not too critical for our sample table, which only contains 35 rows, but it will make our example a little easier to understand).

Now our table just shows the transactions from store 1:

Since we're interested in total sales for each day, let's use the g_sum function to calculate that. We'll add a computed column (using the <willbe> operator) grouping by date and totaling the items in the sales column.

<willbe name="total_sales" value="g_sum(date;;sales)" 
 label="Total Sales" format="dec:2"/>

This adds a new column to our table, which shows the total sales for each day:

In order to contrast and compare, let's first see what would happen if we used g_tshift here to find the total sales from the day before.

The g_tshift function has the form:

g_tshift(G;S;T;TS;X;N)

Since we already selected the rows for store 1, which is the group we're interested in, we can omit the G parameter. (If we hadn't done the <sel> operation earlier, we could have set G to store here in order to group the results by store.)

We can also omit the S parameter, since we want to consider all rows when applying the function.

We'll use the date column for our T parameter and set the TS parameter to 'D' (since the data in the date column is of the date type), and we'll set the N parameter to -1, which will allow us to shift back to the previous day in our table.

And, finally, since we're interested in the total sales for the previous day, we'll set the X parameter to the value in our computed column, total_sales.

So, our g_tshift function should look like:

g_tshift(;;date;'D';total_sales;-1)
Note: Even though we're omitting the G and S parameters, we still need to account for them in the function call; hence, the ";;" before date.

Let's add a computed column to our table with this g_tshift function using <willbe>:

<willbe name="previous_day_sales" 
 label="Previous Day`Total Sales" 
 value="g_tshift(;;date;'D';total_sales;-1)"
 format="type:num;dec:2"/>

Now our table should look something like:

Notice all the N/A values in our new column. That's because the only data we have for consecutive days is on 05/15/12 and 05/16/12. So we can see that g_tshift applied to the rows with the date 05/16/12 result in the total sales for the rows from the previous day, 05/15/12, which is -3.40.

Note: Although there are multiple rows with the date 05/15/12, g_tshift returns the value of X (total_sales) from the first row with that date.

All the other transactions are more than one day apart; therefore, g_tshift returns N/A because there is no exact match with respect to the time period specified.

To find the total sales for the day nearest to the previous day, we can use g_tshift_nearest, which has the form:

g_tshift_nearest(G;S;T;TS;X;N)

All of the parameters will be the same as what we used for g_tshift. Let's add another computed column for the g_tshift_nearest function:

<willbe name="nearest_day_sales" 
 label="Nearest Previous Day`Total Sales" 
 value="g_tshift_nearest(;;date;'D';total_sales;-1)"
 format="type:num;dec:2"/>

Our new computed column looks like:

Because there is an exact match with respect to the time period specified, the results from g_tshift_nearest for the rows with the date 05/16/12 are the same as the results from g_tshift (the first row with the date 05/15/12).

However, for those rows where there is no exact match with respect to the time period specified, g_tshift_nearest will return the value of X for the nearest row that is at least the specified time period away.

For instance, since we have no transactional data for 05/17/12, g_tshift_nearest for the rows with the date 05/18/12 will return the total sales from the nearest row with the date 05/16/12 (the closest previous day). Also, since there is no transactional data for 06/02/12, the results from g_tshift_nearest for the rows with the date 06/03/12 show the total sales from the nearest row with the date 05/18/12.

Note: There are N/A values for the results from both g_tshift and g_tshift_nearest for those transactions dated 05/15/12 because no transactions prior to that date exist in this table.

Additional Information

  • The t_ version of this function defaults the G and T/TS arguments, and omits the S argument. The defaults for G and T/TS are set at table load time based on the organization of the table.