g_tshift(G;S;T;TS;X;N)

Returns the value of the first row that is a specified time period before (or after) the current row within a given group.

Function type

Vector only

Syntax

g_tshift(G;S;T;TS;X;N)
t_tshift(X;N)

Input

Argument Type Description
G any A space- or comma-separated list of column names

Rows are in the same group if their values for all of the columns listed in G are the same.

If G is omitted, all rows are considered to be in the same group.

If any of the columns listed in G contain N/A, the N/A value is considered a valid grouping value.

S integer The name of a column in which every row evaluates to a 1 or 0, which determines whether or not that row is selected to be included in the calculation

If S is omitted, all rows will be considered by the function (subject to any prior row selections).

If any of the values in S are neither 1 nor 0, an error is returned.

T integer or decimal The name of a column whose values are used as the basis of the time series

Row order is determined by T.

Note: T may not be omitted.
TS text A string representing the type of the time T

The choice of TS must match the type of values in T, or the results will be meaningless.

Valid types are:
  • 'D' for the date (YYYYMMDD) type (i.e., one period = one day)
  • 'Q' for the quarter (YYYYQ) type (i.e., one period = one quarter)
  • 'M' for the month (YYYYMM) type (i.e., one period = one month)
  • 'T' for the time (HHMMSS) type (i.e., one period = one second)
  • 'TS' for the date+time type (i.e., one period = one second)
  • 'TSn' for the date+time type construed with n decimal places of precision in the seconds place (i.e., one period = 1/10^n seconds)

If TS is omitted, then T is assumed to be an integer (e.g., a year or a period number, such that one period=1).

X any A column name
N integer The number of rows to shift within the group

N can be an expression that evaluates to an integer value.

Return Value

For every row in each group defined by G and ordered by T (and for those rows where S=1, if specified):
  • If N >= 0:
    • g_tshift returns the value of X for the row that is exactly N time periods after this row in the same group as this row.

      If there are multiple rows that are exactly N time periods after this row, g_tshift returns the value from the first row within that set of rows.

    • If there are no rows that are exactly N time periods after this row, g_tshift returns N/A.

  • If N < 0:
    • g_tshift returns the value of X for the row that is exactly |N| time periods before this row in the same group as this row.

      If there are multiple rows that are exactly |N| time periods before this row, g_tshift returns the value from the first row within that set of rows.

    • If there are no rows that are exactly |N| time periods before this row, g_tshift returns N/A.

Time periods are determined by TS with respect to the values in T.

The result is the same data type as X.

Note: N/As in X are treated like any other value; i.e. they are shifted.

If no rows in a group have valid (non-N/A) values for X, the result for every row of the group is N/A.

Sample Usage

<base table="pub.doc.samples.ref.func.g_func_time_series_sample_usage"/>
<willbe name="g_tshift_1" value="g_tshift(state;include;order;;value;-1)"/>
<willbe name="g_tshift_2" value="g_tshift(state city;include;order;;value;-1)"/>

Example

For this example, we'll use the weather data available in pub.demo.weather.hourly91, which contains the 1991 hourly weather data for stations all across the United States.

Let's say we wanted to compare the hourly temperatures from one week to the previous week.

Our sample weather table includes all kinds of information such as the temperature, humidity, rainfall, pressure, and wind direction for each station. Since we're only interested in the temperature for this example, we can hide all the other columns using the <colord> operation:
<colord cols="id,date,hour,temp"/>
This will result in a table that looks like the following:

Now, to find out what the hourly temperature was the previous week, we can use the g_tshift function, which has the form:

g_tshift(G;S;T;TS;X;N)

Since this table contains information for all of the stations across the U.S., we'll want to group our data by station, so that the results will be specific to each station. Since we also want to compare the hourly temperatures, we'll also group by hour. So, our G parameter will consist of id and hour.

For our analysis, we want to include all rows in the table, so we'll omit the S parameter.

We'll use the date column for our T parameter and set the TS parameter to 'D' (since the data in the date column is of the date type), and we'll set the N parameter to -7, which will allow us to shift back to the previous week (7 days before) in our table.

And, finally, since we're interested in seeing what the temperature was, we'll set the X parameter to temp.

So, our g_tshift function should look like:
g_tshift(id hour;;date;'D';temp;-7)
Note: Even though we're omitting the S parameter, we still need to account for it in the function call; hence, the ";;" before date.
Let's add a computed column to our table with this g_tshift function using a <willbe> operation:
<willbe name="last_week_temp" label="Last Week`Temp`(Celsius)" value="g_tshift(id hour;;date;'D';temp;-7)" format="type:num"/>

Now our table should look something like:

Notice that there are no values in the new computed column for the first number of entries. This is because there is no data 7 days before these entries; therefore, the g_tshift function returns N/A as a result. If we scroll down through the table, we'll see data in the Last Week Temp column starting at 01/08/91 (since there exists valid data 7 days before this date):

You can see that the temperature at station 3103 on 01/08/91 at hour 6 was -1.7 degrees Celsius, and that our g_tshift function shows that the temperature 7 days before (on 01/01/91) at that same hour at the same station was -16.7, which matches with what we previously saw in the table:

g_tshift works well for data that has contiguous data in the column you specify for T. If the time series you are analyzing contains gaps between the time periods, you may consider using g_tshift_nearest.

Additional Information

  • The t_ version of this function defaults the G and T/TS arguments, and omits the S argument. The defaults for G and T/TS are set at table load time based on the organization of the table.