g_aucnt(G;S;X;Z)
Returns an aucnt
object that can be used to
compute the approximate unique count of values in a given column. (Available as of version 10.30)
Function type
Vector only
Description
g_aucnt(G;S;X;Z)
g_aucnt_merge(G;S;M;N)
aucnt_estimate(M)
While these functions can be used for small tables, the motivating problem is to efficiently compute the approximate number of distinct values in a column with a large number of uniques, or across a table with a very large number of rows.
A common usage pattern is for g_aucnt(G;S;X;Z)
to create an
aucnt
object that contains information about the unique values for each
group in a given column. The aucnt
objects from different groups can be
merged together with g_aucnt_merge(G;S;M;N)
to get a new
aucnt
object with information on the total number of uniques across the
merged groups. The estimated number of unique values for any aucnt
object
can be extracted using the aucnt_estimate(M)
function.
Syntax
g_aucnt(G;S;X;Z)
Input
Argument | Type | Description |
---|---|---|
G |
any | A space- or comma-separated list of column names Rows are in the same group
if their values for all of the columns listed in If If any of the columns listed in |
S |
integer | The name of a column in which every row evaluates to a 1 or 0, which determines
whether or not that row is selected to be included in the calculation If
If any of the values in
|
X |
any | A column name A column containing values the function will use to create the
|
Z |
text and decimal | A list of pairs of option names and option values Currently, the only option
you may specify for the
|
Return Value
For every row in each group defined by G
(and for those rows where
S
=1, if specified),
g_aucnt(G;S;X;Z)
returns an aucnt
object that contains a
signature of the data in X
.
Example
This example illustrates how to use the aucnt
functions to estimate the
unique number of SKUs inside the inventory table
pub.doc.retail.altseg.inventory_date. The analysis first calculates
the unique number of SKUs for each day in the table and then merges those results to find
the unique SKUs across all days.
The unique number of SKUs for each day is calculated by running
g_aucnt(G;S;X;Z)
on the sku
column, grouped by
date
(which is the column on which the table is segmented). This function
generates an aucnt
object for each date. The column
aucnt_obj
contains these results. The aucnt_estimate(M)
function is then applied to the aucnt_obj
column. This function estimates
the number of unique SKUs in each aucnt
object. The column
aucnt_by_date
contains these results.
<base table="pub.doc.retail.altseg.inventory_date"/>
<willbe name="aucnt_obj" value="g_aucnt(date;;sku;)"/>
<willbe name="aucnt_by_date" value="aucnt_estimate(aucnt_obj)"/>
The next step is to merge all of the aucnt
objects together using
g_aucnt_merge(G;S;M;N)
and then estimate the number of unique SKU values
across all days using aucnt_estimate(M)
.
Since the aucnt
object in every row for each date is the same,
g_first1(G;S;O)
is used to reduce the number of rows to just the first
row in each group. Furthermore, an empty merge (<merge/>
) is required in
order to place all of the aucnt
objects into a single worksheet so that
g_aucnt_merge(G;S;M;N)
can operate on one group that consists of all
dates.
The g_aucnt_merge(G;S;M;N)
then merges all of the aucnt
objects into one aucnt
object. The column merge_all
contains these results. Finally, the aucnt_estimate(M)
function is then
applied to the merge_all
column, which estimates the number of unique SKUs
in that aucnt
object. The column aucnt_all_time
contains
these results.
<sel value="g_first1(date;;)"/>
<merge/>
<willbe name="merge_all" value="g_aucnt_merge(;;aucnt_obj;)"/>
<willbe name="aucnt_all_time" value="aucnt_estimate(merge_all)"/>
The complete code for this example is as follows:
<base table="pub.doc.retail.altseg.inventory_date"/> <willbe name="aucnt_obj" value="g_aucnt(date;;sku;)"/> <willbe name="aucnt_by_date" value="aucnt_estimate(aucnt_obj)"/> <sel value="g_first1(date;;)"/> <merge/> <willbe name="merge_all" value="g_aucnt_merge(;;aucnt_obj;)"/> <willbe name="aucnt_all_time" value="aucnt_estimate(merge_all)"/>