g_position(G;S;O;X;Y)

Returns the position of the first row within a given group where all the values in one or more columns match the values in another specified set of columns.

Function type

Vector only

Syntax

g_position(G;S;O;X;Y)
t_position(X;Y)

Input

Argument Type Description
G any A space- or comma-separated list of column names

Rows are in the same group if their values for all of the columns listed in G are the same.

If G is omitted, all rows are considered to be in the same group.

If any of the columns listed in G contain N/A, the N/A value is considered a valid grouping value.

S integer The name of a column in which every row evaluates to a 1 or 0, which determines whether or not that row is selected to be included in the calculation

If S is omitted, all rows will be considered by the function (subject to any prior row selections).

If any of the values in S are neither 1 nor 0, an error is returned.

O integer A space- or comma-separated list of column names that determine the row order within a particular group

If O is omitted, the order is the current display order of the table.

If any of the values in O are N/A, an error is returned.

X any simple type A space- or comma-separated list of column names
Y any simple type A space- or comma-separated list of column names

The values in the columns specified by Y are matched against the values in the columns specified by X.

Note: The number of the columns in Y must be the same as those in X, and the data types of the columns listed in Y should also match those listed in X.

Return Value

For every row in each group defined by G (and for those rows where S=1, if specified), g_position returns an integer value corresponding to the index of the first row where the values in all the columns specified by Y match all the columns specified by X. The order is determined by O; if O is omitted, the order is determined by the current row order.

If there is no match, the result is N/A.

Note: Rows are indexed starting at 1, not 0.

Sample Usage

<base table="pub.doc.samples.ref.func.g_position_sample_usage"/>
<willbe name="g_position_1" value="g_position(state;include;order;value;index)"/>
<willbe name="g_position_2" value="g_position(state city;include;order;value;index)"/>

Example

Let's use the Weather Underground Observed Daily table (pub.demo.weather.wunderground.observed_daily) to illustrate the use of g_position(G;S;O;X;Y).

Let's say we want to find out, for each day in every zip code, the first day of the year that the high temperature was five degrees less than the high temperature of that day.

We'll start out by opening the Observed Daily table, which looks something like this:

For the sake of demonstration purposes, let's just look at the weather observations for 2013. We can do this by selecting the rows using the <sel> operation and using the year(X) function:
<sel value="year(date)=2013"/>
Since we're interested in the high temperature for each day in every zip code, let's perform a tabulation using <tabu>, grouping by both zipcode and date, and finding the high temperature (maxtempi).
<tabu label="High Temperatures" breaks="zipcode,date">
  <tcol source="maxtempi" fun="first" name="hightemp" label="Highest`Max`Temp`(F)"/>
</tabu>
Note: There is only one high temperature for each day in a particular zip code, so fun="first" will give us that high temperature, though we could have also specified last, lo, or hi for the fun attribute to get the same results.

Our table now looks similar to the following:

Since we want to find the first day of the year that the temperature was five degrees less than the high temperature for each day, let's create a computed column that will hold the value of this threshold:
<willbe name="threshold" value="hightemp-5"/>

This will give us a new column named threshold:

Now let's use g_position(G;S;O;X;Y) to find the first day of the year that the temperature was equal to our threshold. Since we want to group by zip code, we'll specify the zipcode column for our G parameter. We'll omit the S parameter, since we want the function to consider all rows and not just a subset. We'll specify the date column for the O parameter, so that our function considers the order by the date. Since we want to find the first value in the hightemp column, we'll specify that for X, and since we want to find the first value in that column equal to our threshold, we'll specify the threshold column for Y.

So our code should look like the following:
<willbe name="day_of_year" value="g_position(zipcode;;date;hightemp;threshold)"/>

We can see here that for the zip code 01001, the first day of the year that had a temperature that was five degrees less than the high temperature on 01/02/13 (28) was the 3rd day of the year (01/01/13), whose temperature was 23 degrees. If we look at 01/06/13, we can see that the first day of the year that had that a temperature that was five degrees less than the high temperature that day (42) was the 1st day of the year, whose temperature was 37.

To see the actual date, instead of just the day of the year, we could use the g_pick(G;S;O;X;Y) function and give it the index that we received from g_position(G;S;O;X;Y). Since we want the date associated with the index, we'll specify the date column for X, and we'll specify our day_of_year index for the Y parameter:
<willbe name="first_date" value="g_pick(zipcode;;date;date;day_of_year)"/>

Additional Information

  • The t_ version of this function defaults the G argument and omits the S argument. The default for G is set at table load time based on the organization of the table.