Lazy evaluation
Certain operations in 1010data are not evaluated until the data from those operations are needed by another part of the query.
Key concepts
<willbe>
and<link>
operations are lazily evaluated in 1010data.- Results of
<willbe>
operations are not cached throughout a query but are cached within a single operation.
Discussion
Within 1010data, there exists the concept of lazy evaluation. This means that certain
expressions or operations are not evaluated until the data is needed. For example, you have
created a computed column, c
, that adds the value of column
a
to the value of column b
. However, the value of
c
is not calculated at the time the <willbe>
operation appears in the code.
Suppose in your original table you have 15 columns, but you can only see 10 of them in your
current view. When you create column c
, it will be the 16th column in the
table, and it will not be in your view; therefore the values for column c
do not need to be calculated yet.
However, if you scroll to the right in the table so that column c
is in
your view, those values will then need to be calculated. Yet, they will only be calculated
for the number of rows shown in your current view.
When you create a <willbe>
, you are telling the system that when this
column is referenced in the future, here is how to calculate its value. One important thing
to note is that the results of a <willbe>
are not cached throughout the
rest of the query, but they are cached within a single operation. If you select rows where
column c
contains values greater than 6
, all of the values
in that column will have to be calculated in order to be used in the selection expression,
but those values are not cached.
Therefore, if you then scroll over to column c
so that it is once again in
view, those values have to be recalculated.
However, if you perform a tabulation and create two result columns, one that is the sum of
c
, and one that is the average of c
, the value of
c
is only calculated once, because the results for the column are cached
while it is still performing the tabulation operation.
An exception to consider is the use of g_functions. If a g_function is used in the value
expression of a <willbe>
and that column is evaluated when you scroll to
bring it into view, there needs to be further evaluation than just the current rows.
G_functions operate on groups denoted by the G
argument. Because the groups
could extend beyond what is shown, all of the rows for each group shown in the current view
are evaluated. In addition, while the values of the g_functions are not cached throughout
the rest of the query, the groups are cached. Since the use of one g_function often leads to
the use of others with the same G
argument, it is helpful to cache these
groups for future use, and it is significantly less expensive than caching the values
themselves.
<willbe>
is not the only operation that adheres to lazy evaluation.
<link>
operations also follow this principle. When you use
<link>
, you are not actually combining the columns of the tables at
that moment; you are telling the system where to find the values from the columns in the
foreign table when they need to be referenced. For example, suppose you link the columns of
two tables, a and b, but you don't use any of the
columns from table b in the rest of your query. In this case, the full
<link>
will never be evaluated. However, if you perform a selection
that uses a column from table b in the expression, the
<link>
will then need to be evaluated in order to find the rows that
need to be kept in the worksheet.
The exceptions to this rule are <link>
operations of type
select
, include
, or exclude
. When these
types of links are performed, you are making a selection on the base table, and thus the
<link>
needs to be evaluated so you know which rows to keep in the
worksheet.
Example
Knowing how and when 1010data evaluates certain expressions is useful in creating an optimized version of your query. This can be illustrated through a basic basket analysis query where making subtle changes can improve your query performance.
Starting with the Sales Detail table
(pub.doc.retail.altseg.sales_detail_transid), a selection is made on
trans_date
. Next, a boolean column, product
, is created
to denote rows containing the specified SKU. The column product
is then
used as the X
argument in the g_or(G;S;X)
function to
create another boolean column, trans_prod
, in order to show which
transactions contained that SKU.
<base table="pub.doc.retail.altseg.sales_detail_transid"/> <sel value="year(trans_date)=2015"/> <willbe name="product" value="sku='476666'"/> <willbe name="trans_prod" value="g_or(transid;;product)"/>
Remember that a <willbe>
is not evaluated until the data is needed.
Looking at the rest of the query, the column trans_prod
is used three
times.
<base table="pub.doc.retail.altseg.sales_detail_transid"/> <sel value="year(trans_date)=2015"/> <willbe name="product" value="sku='476666'"/> <willbe name="trans_prod" value="g_or(transid;;product)"/> <sel value="trans_prod"/> <willbe name="trans_prod_fs" value="g_first1(transid;trans_prod;)"/> <willbe name="sales_trans_prod" value="g_sum(transid;product;xsales)*trans_prod_fs"/> <willbe name="department" value="dept=35"/> <willbe name="trans_dept" value="g_or(transid;;department)"/> <willbe name="trans_both" value="trans_prod*trans_dept"/> <willbe name="trans_both_fs" value="g_first1(transid sku;trans_both;)"/> <willbe name="sales_both" value="g_sum(transid sku;trans_both;xsales)*trans_both_fs"/> <tabu breaks="group" label="Tabulation"> <tcol fun="sum" name="bskts_both" source="trans_both_fs" label="Baskets`Containing Both"/> <tcol fun="sum" name="bskt_cnt_prod" source="trans_prod_fs" label="Baskets Containing`Product"/> <tcol fun="sum" name="sales_prod" source="sales_trans_prod" label="Sales of`Product"/> <tcol fun="sum" name="sales_both" source="sales_both" label="Sales of`Both"/> </tabu>
The column trans_prod
is present in a selection statement, as well as in
the value expression of two other computed columns, trans_prod_fs
and
trans_both
.
Since the two columns that use the value of trans_prod
are also lazily
evaluated, those values are not calculated until they are used as the source columns for two
of the <tcol>
operations in the tabulation at the end of the query.
Recall that the results of a <willbe>
are not cached throughout the
query, but the results are cached within a single operation. trans_prod
is
evaluated once when used in the selection expression, and the results are not cached.
Therefore, its value needs to be recalculated when the columns using
trans_prod
are evaluated in the tabulation. Since you are within a single
operation when trans_prod_fs
and trans_both
are evaluated,
the results are cached and therefore, trans_prod
is only evaluated once for
the two columns. Thus, the value of trans_prod
is evaluated twice, in
total, throughout the query.
However, the query can be optimized so that trans_prod
is only evaluated
once. Consider the changes shown below in bold.
<base table="pub.doc.retail.altseg.sales_detail_transid"/> <sel value="year(trans_date)=2015"/> <willbe name="product" value="sku='476666'"/> <sel value="g_or(transid;;product)"/> <willbe name="trans_prod" value="1"/> <willbe name="trans_prod_fs" value="g_first1(transid;trans_prod;)"/> <willbe name="sales_trans_prod" value="g_sum(transid;product;xsales)*trans_prod_fs"/> <willbe name="department" value="dept=35"/> <willbe name="trans_dept" value="g_or(transid;;department)"/> <willbe name="trans_both" value="trans_prod*trans_dept"/> <willbe name="trans_both_fs" value="g_first1(transid sku;trans_both;)"/> <willbe name="sales_both" value="g_sum(transid sku;trans_both;xsales)*trans_both_fs"/> <tabu breaks="group" label="Tabulation"> <tcol fun="sum" name="bskts_both" source="trans_both_fs" label="Baskets`Containing Both"/> <tcol fun="sum" name="bskt_cnt_prod" source="trans_prod_fs" label="Baskets Containing`Product"/> <tcol fun="sum" name="sales_prod" source="sales_trans_prod" label="Sales of`Product"/> <tcol fun="sum" name="sales_both" source="sales_both" label="Sales of`Both"/> </tabu>
With the above changes, the same g_function is used, but it is used in the value expression
of the selection statement instead of in the value expression of the
<willbe>
. The trans_prod
column is then created with
the value of 1
. This subtle difference might seem trivial. Both queries use
a <sel>
and a <willbe>
operation, but the order as
well as the contents of the value expressions are reversed. Since the selection expression
no longer uses trans_prod
in the value expression,
trans_prod
is only evaluated once.
The number of times that trans_prod
is evaluated is not exactly the reason
why one query performs better than the other. There is a little more to it. Not only do you
have to consider the number of times an expression is evaluated, but also what that
expression is. Both queries use the expression g_or(transid;;product)
but
in different ways. The first query uses it in the creation of trans_prod
,
and the second query uses it in a selection. Thus, this g_function is evaluated twice in the
first query, when the column trans_prod
is evaluated, and once in the
second query, in the selection expression. The second evaluation in the second query is
avoided by using 1
as the value of trans_prod
. You are
able to do this because g_or
produces a 1
or a
0
based on it evaluating to true or false, and when you make a selection
using the results of this function, you are only selecting rows with a value of
1
.
Due to the number of times g_or(transid;;product)
is evaluated in both
queries, the second query is the better approach. Although g_functions are efficient, with
large amounts of data you can still see decreased run times when making optimizations such
as these.