Counting based on conditions

You can determine how many times a specified value occurs in a table based on predetermined and conditional requirements.

Difficulty

Objective

You want to determine how many customers bought a specific item during a promotion and then repurchased that item at a later date. You know how to do this in SQL using SELECT, DISTINCT, and JOIN, and you would like to produce the same results using 1010data.

SQL solution

SELECT DISTINCT bght_on_pr.customer
FROM
  (SELECT customer,transid,sku FROM sales_item_detail WHERE date=20150105 AND sku=406444)
  AS bght_on_pr
  JOIN
  (SELECT customer,transid,sku FROM sales_item_detail WHERE date>20150105 AND sku=406444)
  AS bght_after
  ON bght_on_pr.customer=bght_after.customer;

Using two different SELECT statements combined with an AS, you can retrieve temporary store results from the Sales Detail table, in two temporary tables. One table contains customers that purchased the item on the promotion date, and the other contains customers that purchased the item after the promotion date. Using JOIN and matching up customer ID's will produce a list of customers that bought the item both on the promotion and again afterwards. However, you must use DISTINCT to obtain a list of unique customer ID's.

With 1010data, you can obtain the same results without the use of temporary tables.

1010data Macro Language solution

<base table="pub.doc.retail.altseg.sales_detail_transid"/>
<sel value="(sku=406444)&(trans_date>=20140105)"/>
<willbe name="bght_on_pr" value="trans_date=20150105"/>
<willbe name="bght_after" value="trans_date>20150105"/>
<merge/>
<sel value="g_or(customer;;bght_on_pr) & g_or(customer;;bght_after)"/>
<sel value="g_first1(customer;;)"/>

With 1010data, you can use g_functions to produce the same results. This is the preferred method because g_functions take advantage of how the data is stored. However, your table needs to be properly segmented in order to use g_functions. The Sales Item Detail table is segmented by customer and transaction ID. In order to determine the number of customers that bought the item both on the promotion and afterwards, the table would need to be only segmented by customer.

In order to work around this restriction you perform an empty merge, <merge/> on the table. This will condense the table into a single segment, and then you can use the necessary groupings within each g_function. However, you should only use this method on tables smaller than ten million rows, or the system will have difficulty completing the query.

Two g_or(G;S;X) functions are used to determine if any customer bought the item on the promotion and then bought it again afterwards. g_first1(G;S;O) then selects only the first instance of every customer to avoid duplicates and to obtain a list of customers that meet your requirements.

Alternate 1010data Macro Language solution

<base table="pub.doc.retail.altseg.sales_detail_transid"/>
<sel value="(sku=406444)&(trans_date>=20140105)"/>
<willbe name="bght_on_pr" value="trans_date=20150105"/>
<willbe name="bght_after" value="trans_date>20150105"/><tabu label="Tabulation on Sales Detail" breaks="customer">
  <tcol source="bght_on_pr" fun="sum" name="sum_on_pr" label="Bought on Promotion"/>
  <tcol source="bght_after" fun="sum" name="sum_after_pr" label="Bought After Promotion"/>
</tabu>
<sel value="(sum_on_pr<>0)&(sum_after_pr<>0)"/>

If your table is not properly segmented, and therefore you are unable to use g_functions, there is another approach which instead preforms a tabulation. First, the data is limited to the transactions that occurred on or after the promotion date and include the specified item, in this example the promotion occurred on January 5th, 2015 and the SKU of the item is 406444. Two flag columns are created, one to determine if the customer bought the item on the date of the promotion and one to determine if the customer purchased the same item again afterwards.

Performing a tabulation which breaks on customer, will group together the same customers and display how many times they bought the item on the promotion date and after the promotion date. Finally, you select only the customers that bought the item both on the promotion date and at least one time afterwards.

Further reading

If you would like to learn more about the functions and operations discussed in this recipe, click on the links below:

g_or(G;S;X)

g_first1(G;S;O)

<tabu>