g_aucnt(G;S;X;Z)

Returns an aucnt object that can be used to compute the approximate unique count of values in a given column. (Available as of version 10.30)

Function type

Vector only

Description

The functions used in determining the approximate unique count of values in a given column are:
  • g_aucnt(G;S;X;Z)
  • g_aucnt_merge(G;S;M;N)
  • aucnt_estimate(M)

While these functions can be used for small tables, the motivating problem is to efficiently compute the approximate number of distinct values in a column with a large number of uniques, or across a table with a very large number of rows.

A common usage pattern is for g_aucnt(G;S;X;Z) to create an aucnt object that contains information about the unique values for each group in a given column. The aucnt objects from different groups can be merged together with g_aucnt_merge(G;S;M;N) to get a new aucnt object with information on the total number of uniques across the merged groups. The estimated number of unique values for any aucnt object can be extracted using the aucnt_estimate(M) function.

Syntax

g_aucnt(G;S;X;Z)

Input

Argument Type Description
G any A space- or comma-separated list of column names

Rows are in the same group if their values for all of the columns listed in G are the same.

If G is omitted, all rows are considered to be in the same group.

If any of the columns listed in G contain N/A, the N/A value is considered a valid grouping value.

S integer The name of a column in which every row evaluates to a 1 or 0, which determines whether or not that row is selected to be included in the calculation

If S is omitted, all rows will be considered by the function (subject to any prior row selections).

If any of the values in S are neither 1 nor 0, an error is returned.

X any A column name

A column containing values the function will use to create the aucnt object.

Z text and decimal A list of pairs of option names and option values

Currently, the only option you may specify for the Z parameter is:

'error_rate' value
The value must be greater than or equal to 0.005.

For example: 'error_rate' 0.005

The default is 0.01.

Return Value

For every row in each group defined by G (and for those rows where S=1, if specified), g_aucnt(G;S;X;Z) returns an aucnt object that contains a signature of the data in X.

Example

This example illustrates how to use the aucnt functions to estimate the unique number of SKUs inside the inventory table pub.doc.retail.altseg.inventory_date. The analysis first calculates the unique number of SKUs for each day in the table and then merges those results to find the unique SKUs across all days.

The unique number of SKUs for each day is calculated by running g_aucnt(G;S;X;Z) on the sku column, grouped by date (which is the column on which the table is segmented). This function generates an aucnt object for each date. The column aucnt_obj contains these results. The aucnt_estimate(M) function is then applied to the aucnt_obj column. This function estimates the number of unique SKUs in each aucnt object. The column aucnt_by_date contains these results.

<base table="pub.doc.retail.altseg.inventory_date"/>
<willbe name="aucnt_obj" value="g_aucnt(date;;sku;)"/>
<willbe name="aucnt_by_date" value="aucnt_estimate(aucnt_obj)"/>

The next step is to merge all of the aucnt objects together using g_aucnt_merge(G;S;M;N) and then estimate the number of unique SKU values across all days using aucnt_estimate(M).

Since the aucnt object in every row for each date is the same, g_first1(G;S;O) is used to reduce the number of rows to just the first row in each group. Furthermore, an empty merge (<merge/>) is required in order to place all of the aucnt objects into a single worksheet so that g_aucnt_merge(G;S;M;N) can operate on one group that consists of all dates.

The g_aucnt_merge(G;S;M;N) then merges all of the aucnt objects into one aucnt object. The column merge_all contains these results. Finally, the aucnt_estimate(M) function is then applied to the merge_all column, which estimates the number of unique SKUs in that aucnt object. The column aucnt_all_time contains these results.

<sel value="g_first1(date;;)"/>
<merge/>
<willbe name="merge_all" value="g_aucnt_merge(;;aucnt_obj;)"/>
<willbe name="aucnt_all_time" value="aucnt_estimate(merge_all)"/>

The complete code for this example is as follows:

<base table="pub.doc.retail.altseg.inventory_date"/>
<willbe name="aucnt_obj" value="g_aucnt(date;;sku;)"/>
<willbe name="aucnt_by_date" value="aucnt_estimate(aucnt_obj)"/>
<sel value="g_first1(date;;)"/>
<merge/>
<willbe name="merge_all" value="g_aucnt_merge(;;aucnt_obj;)"/>
<willbe name="aucnt_all_time" value="aucnt_estimate(merge_all)"/>