g_ntile(G;S;O;X;N)

Returns an integer representing the quantile interval (or "bucket") for each row that is a member of a given group based on the values in a particular column. Resultant buckets are of roughly equal size.

Function type

Vector only

Syntax

g_ntile(G;S;O;X;N)

Input

Argument Type Description
G any A space- or comma-separated list of column names

Rows are in the same group if their values for all of the columns listed in G are the same.

If G is omitted, all rows are considered to be in the same group.

If any of the columns listed in G contain N/A, the N/A value is considered a valid grouping value.

S integer The name of a column in which every row evaluates to a 1 or 0, which determines whether or not that row is selected to be included in the calculation

If S is omitted, all rows will be considered by the function (subject to any prior row selections).

If any of the values in S are neither 1 nor 0, an error is returned.

O integer or decimal A space- or comma-separated list of column names that determine the row order within a particular group
X any numeric type A column name
N integer The number of buckets to create for each unique value in group G

Return Value

For every unique value in the grouping column (G), and for those rows where S=1, if specified, g_ntile(G;S;O;X;N) returns an integer representing the bucket into which each row has been placed, based on values found in column X.

Example

In this example, g_ntile(G;S;O;X;N) provides a bucket value for each department within a group of stores based on the total sales for that department.

<base table="pub.doc.retail.altseg.sales_detail_transid"/>
<tabu breaks="store,dept" label="g_ntile Example">
 <tcol fun="sum" name="sales_by_dept" source="xsales"/>
</tabu>
<link table2="pub.doc.retail.altseg.products" col="dept" col2="dept"/>
<colord cols="store,deptdesc,sales_by_dept"/>
<sort cols="store,`sales_by_dept"/>
<willbe name="ntiles" value="g_ntile(store;;;sales_by_dept;3)"/>

The query above returns the following results:

Example: Creating percentiles

Percentiles are a use-case for quantiles. To break a population up into whole percentiles, simply pass the N parameter the value 100. In this example, a tabulation is used to summarize total sales per transaction. A percentile column is then created using g_ntile(G;S;O;X;N), indicating which percentile each transaction falls into vs. the entire population:
<base table="pub.doc.retail.salesdetail"/>
<tabu breaks="transid" label="Tabulation">
  <tcol source="xsales" fun="sum"/>
</tabu>
<willbe name="percentile_to_tenth" value="g_ntile(;;;sales_by_transaction;100)"/>
If a more granular breakdown is required, say, percentiles to the tenth of a percent, then simply pass 1000 to the N parameter and divide the output of the function by 10:
<base table="pub.doc.retail.salesdetail"/>
<tabu breaks="transid" label="Tabulation">
  <tcol source="xsales" fun="sum" name="sales_by_transaction" label="Sales by`Transaction"/>
</tabu>
<willbe name="percentile_to_tenth" value="g_ntile(;;;sales_by_transaction;1000)/10"/>

Additional Information

The following details should be considered when using g_ntile(G;S;O;X;N):
  • When there is a "tie" between two values, the value appearing first in the original order of the data, or the value appearing first in ordered specified with the O argument, will be used
  • Quantiles themselves (e.g., the "dividing line" between buckets, may or may not be values found in the set of rows being broken into buckets
  • The lowest numbers appear in the lowest bucket, and the highest numbers appear in the highest bucket
  • g_ntile is defined as follows: g_ntile(G;S;O;X;N) == int(-floor(-N*((1+g_cnt(G;S)-g_rank(G;S;O;X))/g_cnt(G;S))))