Denormalizing tables

In SQL it is very common to have smaller tables that are used as reference tables to a larger data set. This allows the data to be stored easier and accessed faster. However, if you want to perform any calculations that involve these reference tables in addition to your main table, you must first combine them, or denormalize them.

Difficulty

Objective

You would like to know the sum of sales of fruit during the summer of 2015 for each of your stores. Additionally, you would like to know the location of each of these stores. However, the necessary information to complete this analysis is contained in three different tables. Therefore, you need to combine or denormalize the tables first.

SQL solution

SELECT * 
FROM sales_item_detail 
LEFT JOIN product_master   
  ON sales_item_detail.sku=product_master.sku 
LEFT JOIN store_master 
  ON sales_item_detail.store=store_master.store;

Denormalizing the tables so that all information relevant to the problem is in one table is relatively simple. You just need two JOIN's to combine all three tables. However, if you then want to perform an analysis on this denormalized table it gets more complicated. You could either create a temporary table from this query to then use in your next query, or you could add additional logic to the current query to get your desired results.

SELECT sales_item_detail.store,
       city,
       state,
       SUM(xsales) AS sum_sales 
FROM sales_item_detail 
  LEFT JOIN product_master   
    ON sales_item_detail.sku=product_master.sku 
  LEFT JOIN store_master 
    ON sales_item_detail.store=store_master.store 
WHERE groupdesc='FRUIT' AND date<=20150621 AND date>=20150922 
GROUP BY sales_item_detail.store;

In the above query, additional logic is added to the prior query in order to obtain the desired results presented in the objective. However, when you submit this new query, the tables need to be recombined and the results you already obtained are not utilized. With 1010data, you can denormalize your tables and then run additional queries without resubmitting previous work.

1010data Macro Language solution

<base table="pub.doc.retail.altseg.sales_detail_transid"/>
<sel value="between(trans_date;20150621;20150922)"/>
<link table2="pub.doc.retail.altseg.products" col="sku" col2="sku" 
type="include">
  <sel value="groupdesc='FRUIT'"/>
</link>
<tabu label="Tabulation on Sales Detail" breaks="store">
  <break col="store" sort="up"/>
  <tcol fun="sum" source="xsales" label="Sum of`Extended`Sales"/>
</tabu>
<link table2="pub.doc.retail.altseg.stores" col="store" col2="store" 
cols="city,state"/>

First a selection is made to include only those transactions that occur in the summer of 2015. In order to further select the products from the "Fruit Group," you need to link in the Product Master table. You could simply link in this table and make a selection afterwards, but for this solution the selection is done within the link operation and then type="include" is specified. This link type will not bring in any additional columns from the Product Master, but it will perform a selection on the base table based on the data present in the foreign table.

Then, you can perform a tabulation to obtain the sum of sales for fruit during the summer of 2015 by store, and link in the Store Master table to add location information to the worksheet.

Alternate 1010data Macro Language solution

<base table="pub.doc.retail.altseg.sales_detail_transid"/>
<sel value="between(trans_date;20150621;20150922)"/>
<link table2="pub.doc.retail.altseg.products" col="sku" col2="sku" 
cols="groupdesc"/>
<link table2="pub.doc.retail.altseg.stores" col="store" col2="store" 
cols="city,state"/>
<sel value="groupdesc='FRUIT'"/>
<tabu label="Tabulation on Sales Detail" breaks="store,city,state">
  <break col="store" sort="up"/>
  <tcol fun="sum" source="xsales" label="Sum of`Extended`Sales"/>
</tabu>

This solution denormalizes the tables first, and then performs the necessary operations to obtain the desired calculation. The important thing to remember is that with either solution, the state of your session from the previous query is used going forward, so that operations do not need to be resubmitted.

Further reading

If you would like to learn more about the functions and operations discussed in this recipe, click on the links below:

<link>