Time comparison analysis by groups
You can compare aggregated data for two different time periods, even if that data is contained in the same column.
Difficulty
Objective
You want to perform an analysis that compares the data from one time period to a previous
time period for multiple different categories. For instance, retailers often want to compare
this year’s sales to last year’s sales. Additionally, you want these sales to be further
broken down by group. Often, especially in time-ordered data sets, the values you want to
compare are located in the same column but not in the same row. You know how to do this in
SQl by using JOIN
and UNION
, and you would like to produce
the same results using 1010data.
SQL solution
SELECT results.month_2015 AS `Month`,results.group_2015 AS `Group`, results.sum_sales_2015 AS `2015 Sum of Sales`, results.sum_sales_2016 AS `2016 Sum of Sales`, results.sum_sales_2016 - results.sum_sales_2015 AS YOY, (results.sum_sales_2016 - results.sum_sales_2015)/results.sum_sales_2015 AS `YOY Perc` FROM ( SELECT t1.`MONTH(date)` AS month_2015 ,t1.group AS group_2015,t1.`ROUND(SUM(xsales),2)` AS sum_sales_2015,IFNULL(t2.`MONTH(date)`, t1.`MONTH(date)`) AS month_2016,IFNULL(t2.group,t1.group) AS group_2016,IFNULL(t2.`ROUND(SUM(xsales),2)`,0) AS sum_sales_2016 FROM (SELECT MONTH(date),YEAR(date),`group`, ROUND(SUM(xsales),2) FROM sales_item_detail WHERE YEAR(date)=2015 GROUP BY MONTH(date),YEAR(date),`group`) AS t1 LEFT JOIN (SELECT MONTH(date),YEAR(date),`group`, ROUND(SUM(xsales),2) FROM sales_item_detail WHERE YEAR(date)=2016 GROUP BY MONTH(date),YEAR(date),`group`) AS t2 ON t1.group = t2.group AND t1.`MONTH(date)` = t2.`MONTH(date)` UNION SELECT IFNULL(t1.`MONTH(date)`,t2.`MONTH(date)`) AS month_2014,IFNULL(t1.group,t2.group) AS group_2015, IFNULL(t1.`ROUND(SUM(xsales),2)`,0) AS xsales_2015,t2.`MONTH(date)`,t2.group,t2.`ROUND(SUM(xsales),2)` FROM (SELECT MONTH(date),YEAR(date),`group`, ROUND(SUM(xsales),2) FROM sales_item_detail WHERE YEAR(date)=2015 GROUP BY MONTH(date),YEAR(date),`group`) AS t1 RIGHT JOIN (SELECT MONTH(date),YEAR(date),`group`, ROUND(SUM(xsales),2) FROM sales_item_detail WHERE YEAR(date)=2016 GROUP BY MONTH(date),YEAR(date),`group`) AS t2 ON t1.group = t2.group AND t1.`MONTH(date)` = t2.`MONTH(date)` ) AS results ORDER BY results.group_2015 ASC;
Using SQL, in order to utilize results from executed select statements in additional select
statements, you need to nest the statements into one query. Additionally, each of these
nested select statements need to be labeled as a temporary table, using AS
.
With 1010data, you can execute queries consecutively, using the results of the previous
query as a starting point for the next one, without having to save your results or nest your
statements.
Another drawback of using SQL to complete this problem is that when you join tables
together using LEFT JOIN
and RIGHT JOIN
, values from the
joining table that do not have a match, are given a value of NULL
. With
1010data, you use a <link>
instead of a JOIN
to combine
tables in this way and values that do not match are given a value of 0. This makes it
possible to still use these values in aggregations, where as aggregate functions in SQL
ignore NULL
values.
FULL OUTER
JOIN
is not possible. Therefore, a LEFT JOIN
and a
RIGHT JOIN
are combined with a UNION
to produce the
same affect.1010data Macro Language solution
<base table="pub.doc.retail.altseg.sales_detail_transid"/> <willbe name="month" label="Month" value="month(trans_date)"/> <willbe name="year" label="Year" value="year(trans_date)" format="type:nocommas"/> <tabu label="Tabulation on Sales Detail" breaks="group,month,year"> <break col="group" sort="up"/> <break col="month" sort="up"/> <break col="year" sort="up"/> <tcol source="xsales" fun="sum" name="sum" label="Sum of`Extended`Sales"/> </tabu> <willbe name="last_year" label="Last Years Sum" value="g_rshift(month group;;;sum;-1)" format="dec:2"/> <willbe name="yoy" label="YOY" value="sum-last_year" format="dec:2"/> <willbe name="yoy_perc" label="YOY Percentage" value="yoy/last_year" format="type:pct;dec:2"/>
Performing a time comparison analysis allows you to examine how your data changes from one time period to another. Additionally, the summarized values for each time period should be displayed in adjacent columns to allow for easy comparison between the two.
This analogue presents two methods. The most efficient way is to use the 1010data function
g_rshift(G;S;O;X;N)
to shift rows according to time period. This function
is used to create a new column that, along with additional columns, can then be used to
determine the percent loss or gain from one year to the next.
However, first two new columns representing the month and year are created to be used along
with Group as break columns in a tabulation over each
month/year/group combination. (Note that the sample data set,
pub.doc.retail.salesdetail, only contains information for January of
2014 and 2015.) After the tabulation, the g_rshift(G;S;O;X;N)
function is
used to shift the rows in the worksheet so that the same month from both years in the
analysis are in adjacent rows. Once the aggregations are in the same row, calculating the
difference between them, and ultimately the percentage change, is simple.
Alternate 1010data Macro Language solution
<base table="pub.doc.retail.altseg.sales_detail_transid"/> <willbe name="month" label="Month" value="month(date)"/> <willbe name="fy2015" label="FY2014" value="year(date)=2015"/> <willbe name="fy2016" label="FY2015" value="year(date)=2016"/> <tabu label="YOY Monthly Group Sales" breaks="month,group"> <tcol source="xsales" fun="sumwd" name="sum2015" weight="fy2015" label="2015 Sum of Monthly Group Sales"/> <tcol source="xsales" fun="sumwd" name="sum2016" weight="fy2016" label="2016 Sum of Monthly Group Sales"/> </tabu> <willbe name="yoy" label="YOY" value="sum2016-sum2015" format="dec:2"/> <willbe name="yoy_perc" label="YOY Percentage" value="yoy/sum2015" format="type:pct;dec:2"/> <sort col="group" dir="up"/>
Another solution is to compute a weighted sum in a tabulation using reference columns, one for each time period, to place the desired values in new columns in the same row. Before completing a tabulation, you must create the reference columns. One column is created to distinguish the month and two additional columns are created to distinguish which fiscal year each transaction belongs to.
A tabulation then uses the Month column along with Group to break on and group the records by. The fiscal year flag columns are then used as weights for the dot product in the calculation. Once the aggregations are in the same row, calculating the difference between them, and ultimately the percentage change, is simple.
Further reading
If you would like to learn more about the functions and operations discussed in this recipe, click on the links below: