Retail analysis examples#

Download Jupyter Notebook

This section presents basic examples of retail data, to show how familiar it is for Python users, and to demonstrate performance on a large dataset.

To perform TenFrame operations on retail data, first import the py1010, tenFrame, pandas and numpy libraries, and create a 1010data session:

import py1010
import tenFrame as tf
import pandas as pd
import numpy as np
session = py1010.Session("1010data URL", "USERNAME", "PASSWORD", py1010.POSSESS)

Example 1: Sales data rows#

This example shows the total number of sales rows and the number of rows with known customers (stripping the NA values).

sales = tf.TenFrame(session, "retaildemo.retail.sales_detail")
# without downsampling, describe() below takes about 6 minutes.  Plan accordingly.
# sales = sales.sample(frac=1e-3)
print(f"Total sales rows: {len(sales):,}")
# Sales linked with customer data, filtered to rows where such data exists.
salescust = tf.merge(sales, "retaildemo.retail.customers", on="customer_key")
salescust.dropna(subset=["customer_key"], inplace=True)
print(f"Rows with known customers: {len(salescust):,}")
# Sales linked with product data
salesprod = tf.merge(sales, "retaildemo.retail.products", on="sku")
_images/RetailExample_1.png

Example 2: Sales data overview#

We can use tenFrame.TenFrame.describe() to show an overview of the sales data.

%%time
sales.describe(exclude=["transid", "date", "tme", "sku", "store", "customer_key", "promo_flag"])
_images/RetailExample_2.png

Example 3: Sales data detail#

The following shows an overview at a slightly more granular level, with customer data.

%%time
salescust.groupby("gender").describe(exclude=["transid", "date", "tme", "sku", "store", "customer_key", "promo_flag", "birthdate", "customerid", "tier"])
# (the short and very very wide format is what pandas does as well)
_images/RetailExample_3.png

Example 4: Count of transactions by date#

This example shows the total number of transactions for each date.

%%time
r = sales.groupby("date").transid.cnt()
q = r.iloc[0]       # Something to make the time shown honest.
r
_images/RetailExample_4.png

Example 5: Total sales by date#

This example shows the total dollar amount of sales for each date.

%%time
r = sales.groupby("date")['xsales'].sum()
q = r.iloc[0]
r
_images/RetailExample_5.png

Example 6: Transactions, total sales, and sales per transaction, by customer#

This example breaks down the number of transactions, total sales, and the average sales per transaction, by customer number.

%%time
r = salescust.groupby("customer_key").agg({"transid":"count", "xsales":"sum"})
r['sales_per_trans'] = r['xsales_sum'] / r['transid_count']
q = r.iloc[0]
r
_images/RetailExample_6.png

Charting examples#

The following code snippets demonstrates the types of charts available in TenFrame. See tenFrame.TenFrame.Plotter for a complete description of the plot types in TenFrame.

Chart 1: Bar chart on total sales by gender#

salescust.groupby("gender")['xsales'].sum().plot(kind="bar", x="gender", y="xsales_sum")
_images/GenderBarChart.png

Chart 2: Histogram on total sales by age#

salescust[['age']].plot(kind="hist")
_images/AgeHistogram.png

Chart 3: Bar chart on number of transactions by age group#

salescust.groupby("age_group").transid.cnt().sort_values("age_group").plot(kind="bar", x="age_group", y="transid_cnt")
_images/AgeGroupBarChart.png

Chart 4: Bar chart on total sales by age group#

salescust.groupby("age_group").xsales.sum().sort_values("age_group").plot(kind="bar", x="age_group", y="xsales_sum")
_images/TotalSalesBarChart.png

Chart 5: Line graph of total sales by date#

sales.groupby("date").xsales.sum().plot(kind="line", x="date", y="xsales_sum")
_images/TotalSalesbyDateLineGraph.png

Chart 6: Histogram of total sales by date#

sales.groupby("date").xsales.sum()[['xsales_sum']].plot(kind="hist")
_images/TotalSalesbyDateHistogram.png

Chart 7: Comparison of total sales of two stores#

rr = sales[sales.store == 937].groupby("date").xsales.sum(name="store_937")
rr2 = rr.merge(sales[sales.store == 728].groupby("date").xsales.sum(name="store_728"), on="date")
rr2.plot(kind="line", x="date", y=["store_937", "store_728"])
_images/StoreCompChart.png