Performing an affinity analysis
You can determine the relationship of two items based on the amount of times the items were bought in the same transaction, the number of transactions that contained each item individually and the amount of transaction over all.
Difficulty
Objective
You want to determine the affinity, or the relationship, between two items. Let's say, you are analyzing retail transactions and you want to know how likely it is for a certain item to be purchased with a different item. For example, hot dogs and hot dog buns have a high affinity. Or you are the owner if an ice cream shop and you want to know which flavor and which topping are most commonly ordered together.
Solution
<base table="pub.doc.retail.altseg.sales_detail_transid"/> <sel value="between(trans_date;20150101;20160101)"/> <sel value="xsales>0"/> <note>Flag Primary Products</note> <willbe name="sel" value="group=65"/> <sel value="g_or(customer trans_date transid; ;sel)"/> <willbe name="unique_trans" value="g_last1(customer trans_date transid;;)"/> <tabu breaks="sku" label="Tabulation"> <tcol source="transid" fun="ucnte" name="bskts_both" label="# Baskets`with Both"/> <tcol source="unique_trans" fun="sum" name="total_trans"/> </tabu> <willbe name="bskts_prod" value="g_sum(;;total_trans)" label="# Baskets`with Product"/> <willbe name="bskts_both_pct" value="bskts_both/bskts_prod*100" label="% of Baskets`with Both" format="dec:2"/> <note>Total Baskets of Co-Purchase Product</note> <link table2="pub.doc.retail.altseg.sales_detail_transid" col="sku" col2="sku"> <sel value="between(trans_date;20150101;20160101)"/> <sel value="xsales>0"/> <tabu breaks="sku" label="Tabulation"> <tcol source="transid" fun="ucnte" name="bskts_co" label="# Baskets with`Co-Purchase Product"/> </tabu> </link> <willbe name="linker" value="1"/> <note>All Baskets in time/store range selection</note> <link table2="pub.doc.retail.altseg.sales_detail_transid" col="linker" col2="linker"> <sel value="between(trans_date;20150101;20160101)"/> <sel value="xsales>0"/> <tabu label="Tabulation"> <tcol source="transid" fun="ucnte" name="bskts_all" label="Total # Baskets`in Sample"/> </tabu> <willbe name="linker" value="1"/> </link> <willbe name="bskt_aff" label="Affinity" value="min(9999;int(bskts_both_pct/(bskts_co/bskts_all*100)*100))"/> <sel value="bskts_both_pct>=.1"/> <sel value="(bskts_both_pct<>100)&(bskts_co<>bskts_both)"/> <sort col="bskt_aff" dir="down"/> <sort col="sku" dir="up"/> <note>Brings in SKU information - remove if not wanted</note> <link table2="pub.doc.retail.altseg.products" col="sku" col2="sku"/> <col name="dept" fixed="1"/> <col name="group" fixed="1"/> <colord cols="dept,group,sku,description,bskts_prod,bskts_co, bskts_both,bskts_all,bskts_both_pct,bskt_aff"/>
Discussion
Determining the relationship between items is very useful, especially in retail. If you can
figure out what items are commonly bought together you can use these insights to make more
effective decisions. This recipe uses demo retail sales data to determine the relationship
between an item bought within group 65
to every other item bought in that
basket. Below is the formula used to calculate affinity and the basis for our actions in
this recipe.
The solution first flags every row that contains an item from group 65
and
then, using g_or(G;S;X)
, selects all baskets that contain at least one of
these items. A tabulation is completed by breaking on the item SKU to determine the number
of unique transactions and the number of baskets that contain both the current item and an
item from group 65
. By calculating the sum of unique transactions we can
then determine how many total baskets there were that contained an item from group
65
. Thus, the percentage of baskets containing both items can then be
obtained.
From here, we need to link in additional tables. Within each link, we can perform
additional calculations. These include tabulations to determine the total number of baskets
in our sample and the number of baskets that contain each additional item purchased with the
item in group 65
. A final column uses all of our computed information to
determine the affinity score for each item pair.
Further reading
If you would like to learn more about the functions and operations discussed in this recipe, click on the links below: