Workflow

As mentioned in the introduction, the way you think about your data while using SQL and 1010data needs to be different, the same goes for the way you think about your work flow. Let's look at a simple problem and discuss how you would go about solving the problem in both SQL and 1010data.

The Problem

You are in charge of engaging customers at a grocery store and you need to find out some information on the customers that visit your store. First, you want to know when was the first time each customer shopped at your store and how many days are in between each customer's first visit and their next visit. You also want to know how much they spend on each visit.

SQL Work Flow

Before beginning work, you need to think of what you want your end result to be and how you are going to get there. For this problem, you know that you want to end up with one entry for each customer, with one column displaying the date of their first visit to the store and then another displaying the next sequential visit. Each of these entries should also contain a summary of the amount the customer spent during each visit, and there should be a column to communicate how many days have passed since the first visit.

Since you want to compute a summary of sales for each customer by visit date, your first calculation should compute a sum of sales for each customer and each date. Your code might look something like the following:

SELECT customer,date,SUM(xsales) AS sum_sales 
FROM sales_item_detail 
GROUP BY customer,date;

From these results, you need to find the earliest date for each customer, as well as the next sequential date. However, you can't simply enter a new query that uses these results unless you created a new table to store them. Instead, you need to combine your existing query with additional queries in order to obtain the next round of information. In this particular case, you need to combine two additional queries to obtain both the first and second date.

SELECT customer,MIN(first_date),sum_sales 
FROM (
  SELECT customer,sum_sales,MIN(date) AS first_date 
  FROM (
    SELECT customer,date,SUM(xsales) AS sum_sales 
    FROM sales_item_detail GROUP BY customer,date
  ) AS sum_results 
  GROUP BY customer
) AS first_results;

The bold sections illustrate what was added to the original query. By nesting the original query inside another SELECT statement, you can select the first date that a customer shopped at the store from the results containing the sum of sales information.

SELECT first_results.customer,MIN(first_date),first_results.sum_sales,
MIN(second_date),sum_results2.sum_sales 
FROM (
  SELECT customer,sum_sales,MIN(date) AS first_date 
  FROM (
    SELECT customer,date,SUM(xsales) AS sum_sales 
    FROM sales_item_detail GROUP BY customer,date
  ) AS sum_results 
  GROUP BY customer
) AS first_results 
LEFT JOIN (
SELECT customer,MIN(date) AS second_date,SUM(xsales) AS sum_sales  
FROM sales_item_detail 
GROUP BY customer,date
) AS sum_results2 
ON first_results.customer = sum_results2.customer 
WHERE first_results.first_date <> sum_results2.second_date 
GROUP BY customer;

Then, you can join these results to the results from an additional query where you can find the next sequential date that each customer shopped at the store. However, if a customer did not shop at the store a second time, that customer's record is not included in the resultant table.

After you have the dates of the first and second visits of each customer, you need to add an additional column to your previous query that determines the number of days between these visits.

SELECT first_results.customer,MIN(first_date),first_results.sum_sales,
MIN(second_date),sum_results2.sum_sales,DATEDIFF(second_date,first_date) AS days_between 
FROM (
  SELECT customer,sum_sales,MIN(date) AS first_date 
  FROM (
    SELECT customer,date,SUM(xsales) AS sum_sales 
    FROM sales_item_detail GROUP BY customer,date
  ) AS sum_results 
  GROUP BY customer
) AS first_results 
LEFT JOIN (
SELECT customer,MIN(date) AS second_date,SUM(xsales) AS sum_sales  
FROM sales_item_detail 
GROUP BY customer,date
) AS sum_results2 
ON first_results.customer = sum_results2.customer 
WHERE first_results.first_date <> sum_results2.second_date 
GROUP BY customer;

Your resultant table should look like the following image.

As you can tell by this example, you not only need to know how you would like your results to look, but you also need to determine how to connect all of the different parts of your query in order to obtain this result

1010data Work Flow

The work flow for solving the same problem in 1010data is very different. First, to obtain the sum of sales for each customer and each date, you can perform a tabulation with customer and date used as the break columns.

<tabu label="Sum of Sales per Customer and Date" breaks="customer,trans_date">
  <break col="customer" sort="up"/>
  <tcol source="xsales" fun="sum" name="sum_sales" label="Sum of`Extended`Sales"/>
</tabu>

You can determine the first date that a customer shopped at the store with a simple select statement, but before that is done, you want to find the next sequential date that the customer shopped after their first visit.

<willbe name="next_date" label="Date of Next Visit" value="g_rshift(customer;;trans_date;trans_date;1)"/>

Using the same logic, you can create a column that contains the sales for this second visit.

<willbe name="next_date_sales" label="Sales of Next Visit" value="g_rshift(customer;;trans_date;sum_sales;1)"/>

And an additional column that calculates the days in between these visits.

<willbe name="days_between" label="Days in`Between Visits" value="days(trans_date;next_date)"/>

Lastly, you just need to select the rows where the Date column contains the customer's first visit.

<sel value="g_first1(customer;;trans_date)"/>

This table contains records for all customers, those that did not shop at the store a second time just have NA values in the respective columns.

With this particular example, the same order of steps was not followed using SQL and 1010data. In the SQL work flow, the first date each customer shopped at the store was determined second, after the sum of sales was calculated. However, in 1010data, the first date was not selected till the very end. If you had selected only the first date for each customer at the beginning using 1010data, you would have had to follow a similar approach to SQL- linking tables. Waiting to perform this selection allows you to gather all other necessary information first and eliminates the need to link back in the same information.

When using1010data, each time you add on to your query, you don't have to rerun the previous steps. Additionally, you don't have to nest and join your queries together in order to produce your desired end result. Each new step uses the results of the previous steps as it's base without needing to create a temporary table to store the results. It is important to consider this work flow while constructing your 1010data queries, both in order to take advantage of the system and to create efficient queries.