g_asof(G;S;O;X;Y;TX;TY;BA;L)

Returns the position of the row within a given group where all the values in one or more columns most closely match the values in another specified set of columns.

Function type

Vector only

Syntax

g_asof(G;S;O;X;Y;TX;TY;BA;L)
t_asof(X;Y;TX;TY;BA;L)

Input

Argument Type Description
G any A space- or comma-separated list of column names

Rows are in the same group if their values for all of the columns listed in G are the same.

If G is omitted, all rows are considered to be in the same group.

If any of the columns listed in G contain N/A, the N/A value is considered a valid grouping value.

S integer The name of a column in which every row evaluates to a 1 or 0, which determines whether or not that row is selected to be included in the calculation

If S is omitted, all rows will be considered by the function (subject to any prior row selections).

If any of the values in S are neither 1 nor 0, an error is returned.

O integer A space- or comma-separated list of column names that determine the row order within a particular group

If O is omitted, the order is the current display order of the table.

If any of the values in O are N/A, an error is returned.

X any A space- or comma-separated list of column names
Y any A space- or comma-separated list of column names

The values in the columns specified by Y are matched against the values in the columns specified by X.

Note: The number of the columns in Y must be the same as those in X, and the data types of the columns listed in Y should also match those listed in X.
TX text A string representing the type of the time-series column in X

The choice of TX must match the type of the time-series column in X, or the results will be meaningless.

Valid types are:
  • 'D' for the date (YYYYMMDD) type (i.e., one period = one day)
  • 'Q' for the quarter (YYYYQ) type (i.e., one period = one quarter)
  • 'M' for the month (YYYYMM) type (i.e., one period = one month)
  • 'T' for the time (HHMMSS) type (i.e., one period = one second)
  • 'TS' for the date+time type (i.e., one period = one second)
  • 'TSn' for the date+time type construed with n decimal places of precision in the seconds place (i.e., one period = 1/10^n seconds)

If TX is omitted, then X is assumed to be an integer (e.g., a year or a period number, such that one period=1).

TY text A string representing the type of the time-series column in Y

The choice of TY must match the type of the time-series column in Y, or the results will be meaningless.

Valid types are:
  • 'D' for the date (YYYYMMDD) type (i.e., one period = one day)
  • 'Q' for the quarter (YYYYQ) type (i.e., one period = one quarter)
  • 'M' for the month (YYYYMM) type (i.e., one period = one month)
  • 'T' for the time (HHMMSS) type (i.e., one period = one second)
  • 'TS' for the date+time type (i.e., one period = one second)
  • 'TSn' for the date+time type construed with n decimal places of precision in the seconds place (i.e., one period = 1/10^n seconds)

If TY is omitted, then Y is assumed to be an integer (e.g., a year or a period number, such that one period=1).

BA text Specifies, if there is no exact match, whether the values in Y should come before or after the values in X
BA can have either of the following values:
  • 'before'
  • 'after'

If BA is omitted, it defaults to 'after'.

L integer or decimal A whole-number value that specifies the number of rows to shift the result

If L is non-zero, it is added to the result.

If L is omitted, it defaults to 0.

Return Value

For every row in each group defined by G (and for those rows where S=1, if specified), the values in the column(s) specified by Y are searched for in the column(s) specified by X:
  • If BA is 'after', g_asof returns an integer value corresponding to the index of the last row in X whose values are less than or equal to Y.
  • If BA is 'before', g_asof returns an integer value corresponding to the index of the first row in X whose values are greater than or equal to Y.

The order is determined by O; if O is omitted, the order is determined by the current row order.

If there is no match, the result is N/A.

Note: Rows are indexed starting at 1, not 0.

Sample Usage

<base table="pub.doc.samples.ref.func.g_asof_sample_usage"/>
<willbe name="g_asof_1" value="g_asof(symbol;;;sendtime;srctime;'T';'T';'before';0)"/>
<willbe name="g_asof_2" value="g_asof(symbol;;;sendtime;srctime;'T';'T';'after';0)"/>
<willbe name="g_asof_3" value="g_asof(symbol;;;sendtime;srctime;'T';'T';'after';2)"/>

Example

Let's use the Weather Underground Observed Daily table (pub.demo.weather.wunderground.observed_daily) to illustrate the use of g_asof(G;S;O;X;Y;TX;TY;BA;L).

Let's say we want to find out, for each day in every zip code, the high temperature on the same day the year before. If there is no data for exactly one year before, we'll take the closest day before that.

We'll start out by opening the Observed Daily table, which looks something like this:

Since we're interested in the high temperature for each day in every zip code, let's perform a tabulation using <tabu>, grouping by both zipcode and date, and finding the high temperature (maxtempi).
<tabu label="High Temperatures" breaks="zipcode,date">
  <tcol source="maxtempi" fun="first" name="hightemp" label="Highest`Max`Temp`(F)"/>
</tabu>
Note: There is only one high temperature for each day in a particular zip code, so fun="first" will give us that high temperature, though we could have also specified last, lo, or hi for the fun attribute to get the same results.

Our table now looks similar to the following:

Since we want to compare the high temperatures from the year before, let's create a computed column that shifts the date back 12 months using the shiftmonths(X;Y;E) function:
<willbe name="date_last_year" value="shiftmonths(date;-12;0)"/>

This adds the following column to our table:

Now let's use our g_asof(G;S;O;X;Y;TX;TY;BA;L) function to find the row whose value in the date column is equal to or less than the current row's value in the date_last_year column. Let's specify the zipcode column for our G parameter so that we're grouping by zip code. We'll omit the S parameter so that the function considers all rows and not just a subset, and we'll specify the date column for our O parameter so that the order is based on the date. Since we want to compare the dates for the current year to the dates from a year ago, we'll specify date for our X parameter and date_last_year for our Y parameter. Because both of those columns have values in the form YYYYMMDD, we'll specify 'D' for both the TX and TY parameters. Since we want the index of the last row whose value in the date column is equal to or less than the current row's value in the date_last_year column, we'll specify 'after' for the BA parameter. Finally, since we don't want to offset our results at all, we'll specify 0 for L.

Let's create a computed column using the g_asof(G;S;O;X;Y;TX;TY;BA;L) function:
<willbe name="asof" value="g_asof(zipcode;;date;date;date_last_year;'D';'D';'after';0)"/>
Note: Because the last four parameters are the default values, we could have omitted those parameters and written the function call as: g_asof(zipcode;;date;date;date_last_year;;;;) and achieved the same results.

We should see something like the following:

Since there is no information in our table prior to 08/31/12, we see N/A values in the new asof column.

Click Rows > Find Row (Advanced)... and enter "asof<>NA" in the Expression field, then click Find Next to go to the first row that does not contain an N/A in the asof column. You should see something similar to the following:

We can see that the date column in the first row has the value 08/31/13 and that the date_last_year column is 08/31/12. We can also see that the asof column contains the value 1, which means that the first row in the group with the zip code 01001 has a value of 08/31/12 in its date column.

Let's create a computed column that will display the temperature from that date using the g_pick(G;S;O;X;Y) function. Since we want the high temperature associated with the index, we'll specify the hightemp column for X, and we'll specify our asof index for the Y parameter:
<willbe name="maxtemp_last_year" value="g_pick(zipcode;;date;hightemp;asof)"/>
Note: We're still grouping by zipcode and ordering by date.
For demonstration purposes, let's also create a computed column that will display the date from that row as well, so we can see which date we're getting the maxtemp_last_year values from:
<willbe name="maxtemp_date_last_year" value="g_pick(zipcode;;date;date;asof)"/>
Note: This call is almost identical to the previous function call, except that we're specifying date for the X parameter instead of hightemp,

We should see something like the following:

Let's go to the first row that does not have an N/A value in the maxtemp_last_year column. You can right-click on any cell in the maxtemp_last_year column that has an N/A value and click Find next row where maxtemp_last_year does not have the value N/A:

You should see something similar to the following:

In the first row, we can see that on 08/31/13, the high temperature was 80 degrees. We calculated that the prior year's date was 8/31/12, which makes sense. The results from g_asof(G;S;O;X;Y;TX;TY;BA;L) for that row returned that our matching row was row 1 in the zip code 01001 grouping. The results from g_pick(G;S;O;X;Y) for that index returned a high temperature of 91, and the date associated with that index was 08/31/12, which matches the date we calculated. So it looks like our table had the information for exactly one year before the date we were interested in.

Now let's see how g_asof(G;S;O;X;Y;TX;TY;BA;L) handles the situation if the table didn't have information for the exact same day one year before.

For example, in this table, there is a gap in the data from 09/26/12 to 11/06/12:

Click Rows > Find Row (Advanced)... and enter "date_last_year<>maxtemp_date_last_year" in the Expression field, then click Find Next to go to the first row where the date we calculated for the year before doesn't match the date we got from g_asof (which means that the exact date one year before didn't exist in the table, so g_asof gave us the row with the closest date to that). You should see something similar to the following:

From this screenshot, we can see that on 09/27/13, the high temperature was 64 degrees. We calculated the prior year's date as 09/27/12, which again makes sense. However, g_asof(G;S;O;X;Y;TX;TY;BA;L) returned an index of 27, which corresponds to the row that has the date 09/26/12, because there was no information for 09/27/12 (or, as we can see from the screenshot, for the next number of days). So g_asof(G;S;O;X;Y;TX;TY;BA;L) returns the information for 09/26/12, which is the last row whose value in the date column is equal to or less than the current row's value in the date_last_year column.

We can find the next row where date_last_year and maxtemp_date_last_year match again by clicking Rows > Find Row (Advanced)... and entering "date_last_year=maxtemp_date_last_year" in the Expression field, then clicking Find Next.

Since we have data for those dates, we see that the values in date_last_year and maxtemp_date_last_year match, and we can see the high temperature from those dates in the maxtemp_last_year column.

Additional Information

  • The t_ version of this function defaults the G argument and omits the S argument. The default for G is set at table load time based on the organization of the table.