Query Parameters#
The 1010data platforms supports queries that can be “parameterized” to some extent, to make them more general and more generally usable. So one could write an XML query that computes some statistics given, say, an individual store-ID out of a huge table of sales data, and then call the same query with different store-IDs at need, without having to edit the query. TenFrame incorporates this functionality, so you can develop TenFrames (which are, after all, really 1010data queries) with parameters in them, and then save them to the 1010data object tree and perhaps call them via other interfaces.
It should be noted that this feature’s usefulness is somewhat limited in the face of how much you can use and control parameters in python itself, and how much more simply you can do so. You can easily define a function that takes a store-ID as a parameter and builds a TenFrame that computes statistics for it, with no need for the kind of parameterization being described here. This feature is more for developing queries that can be called or used from other interfaces, perhaps the 1010data web GUI in the macro-language workshop, or even via simple calls to the web API. So don’t be discouraged and think you have to understand this feature in order to do parameterized things in tenFrame! Most of the time you can just do normal things in python the way you would be used to doing them with pandas. This is an advanced topic for making parameterized queries you can save to the server.
The point of tenFrame is to prevent you, the user, from having to learn our XML language in order to do things, but unfortunately at least to understand what is happening, you will need to read up at least a little on how parameters work in macro-language. A look at this page should at least give you some idea.
Param Objects#
As you perform tenFrame operations on a TenFrame and its columns, the
operations you do become a 1010data query in our XML macro-language. We’ve
already seen, for example, how a pandas-like selection like fr[fr.storeid
== 1023]
becomes a 1010data XML operator like <sel value="storeid =
1023"/>
. So you can see how also the “parameters” of your operations
become incorporated into the query: the “1023” that you gave as the value
to select on obviously has to become part of the query as well, and it
does.
Query parameters work by replacing these values that you use with special
Param
objects, which cause the resulting query to be built in such a
way that they remain parameters which can be set using 1010data’s ordinary
query parameter machinery, as described in the page linked above. We’ve
seen how
sales = tf.TenFrame(session, "retaildemo.retail.sales_detail")
sales = sales[sales.storeid == 1023]
sales.printprettyXML()
results in the XML
<?xml version="1.0" ?>
<macro>
<base table="retaildemo.retail.sales_detail"/>
<sel value="(storeid) = (1023)"/>
</macro>
which should make a certain amount of sense even without knowing our XML
language: we specify the base table, and then we do a selection on it. If
we want to parameterize the ID we search for, we just have to substitute a
Param
object in its place:
sales = tf.TenFrame(None, "retaildemo.retail.sales_detail")
sales = sales[sales.storeid == sales.Param("store_id", 1023)]
sales.printprettyXML()
which results in the XML
<?xml version="1.0" ?>
<macro>
<library>
<block name="block" store_id="1023">
<sel value="(storeid) = ({@store_id})"/>
</block>
</library>
<base table="retaildemo.retail.sales_detail"/>
<call block="block" store_id="1023"/>
</macro>
This isn’t really as mystifying as it might look, once you read some of the
basics of 1010data blocks. The “body” of the query, that is,
everything after setting the base table (which in our case is just a
selection operation) has been removed to a <block>
(named "block"
)
inside the query’s <library>
, and replaced by a call to invoke that
block with the supplied value for the variable store_id
. Inside the
block, the variable is surrounded by @{
and }
, as explained in the
1010data documentation, to make it expand to its value.
Constructing the Param object is done with frame.Param()
. Note that
this is called on an existing TenFrame: parameters are always part of some
TenFrame and should not be considered to exist independently of them.
(There is an exception to this which we’ll see below, for the anomalous but
very common case in which the base table name is itself a parameter. But
it’s true for the most part.) Even though .Param()
is a method, it’s
spelled with a capital letter, because it’s acting like a constructor,
creating a Param object that’s specific to this TenFrame.
To construct a parameter, you must supply a name and a default value. The
default value is necessary for several reasons, but the one that is most
relevant to you, the user, is because this way your TenFrame will run
normally, with the supplied default value, and you can continue to work
with it and develop it just as you would any other query (TenFrame). (The
default value is not strictly necessary; it defaults to the empty string
""
, but that is likely to cause you only trouble if you rely on it.)
The actual signature of the .Param
method looks like this:
Param(self, name, value=””, *, quoted=None, separator=’,’)
The keyword-only parameters, quoted=
and separator=
, have to do
with some of the finer points of parameter-handling, which we will have to
look at.
Types of Parameter#
Parameters come in several types, which are mostly easy to distinguish by the type you pass in as the default value when creating it:
numeric
string (or column, that is, TenSeries)
list, which may be one of a. list of numeric b. list of string (or column, that is, TenSeries)
So tenFrame examines the default value you pass in on construction and determines which of these types you want. If you pass in an int or a float value, you are making a numeric parameter. If you pass in a string value, you are making a string parameter. If you pass in a list, then the type of list is determined by the type of its first element. It turns out that a list of mixed numbers and strings does not really make much sense as a parameter (there aren’t any constructions in the 1010data macro language that you would be using that can handle such a mixed list), so a list only has one type. If you pass in an empty list, it is considered to be a list of strings.
String parameters are by default quoted, i.e. they appear as string
literals when the macro-code is expanded before it is interpreted, with
quotes around them (supplied by the 1010data qv(X) function, which
handles corner-cases such as strings containing quotes). To disable this
and use the strings unquoted (for example, if they are column names), you
can specify quoted=False
in the .Param()
call. If you use a
TenSeries (a column), it is considered to be a string (the column name) but
without quoting (i.e., quoted=False
without you having to say so.)
Numeric parameters are not quoted (unless you say quoted=True
for some
reason).
List parameters are converted into strings separated by the separator
character (default ","
) because that’s how 1010data language handles
lists. You can use another character if you expect your values to contain
commas, but do not use the single-quote character '
as your separator
character! Lists of strings (i.e., quoted strings and not column names)
are subjected to the 1010data str_to_lst(S;D) function, which splits
the list on the given delimiter and considers its constituents to be
literals. Lists of strings with quoted=False
(likewise lists of
TenSeries, which become their column names) are not so wrapped and thus
remain effectively unquoted. Lists of numeric values (as determined by the
first member, as mentioned above) are not quoted either.
Base Table as Parameter#
There is one exception case that is also quite common and needs to be
mentioned: the case wherein the base table name is itself a parameter.
The problem is that you need the base table name to construct the TenFrame,
and you need the TenFrame to call the .Param()
method on. The solution
is simply to use |tf.that|__ to create the parameter and pass that in to
the TenFrame constructor:
frm = tf.TenFrame(session, tf.that.Param("basetable", "retaildemo.retail.sales_detail"))
This is considered a special case, and you don’t need to specify
quoted=False
, and also in this case the <base>
operator will be
moved into the <block>
, whereas normally it isn’t.
Param Everything!#
Here’s an example of a query that parameterizes practically everything:
b = tf.TenFrame(session, tf.that.Param("basetable", "retaildemo.retail.sales_detail"))
b = b.merge(b.Param("prodtable", "retaildemo.retail.products", quoted=False), on=b.Param("linkcol", "sku", quoted=False))
datecol = b.Param("datecol", "date", quoted=False)
storecol = b.Param("storecol", "store", quoted=False)
col2sum = b.Param("col2sum", "xsales", quoted=False)
b = b[(b[datecol].yrmo() == b.Param("yrmo", 201212)) & (b[storecol] == b.Param("store", 728))]
b = b.groupby(b.Param("grp", "category", quoted=False))[col2sum].sum()
It winds up with nine parameters, allowing the caller to customize it many ways:
- basetable
The name of the sales detail table to start with. Defaults to
retaildemo.retail.sales_detail
.- prodtable
The name of the table which holds product information, which has to be merged with sales detail table in order to provide information on the products. Defaults to
retaildemo.retail.products
.- linkcol
The name of the column on which to join the basetable and the prodtable. Defaults to
sku
.- datecol
The name of the column containing the date of transactions. Defaults to
date
.- storecol
The name of the column containing store numbers. Defaults to
store
.- col2sum
The name of the column which is to be summed up by the query. Defaults to
xsales
.- yrmo
The six-digit integer specifying the month (YYYYMM) we want to look at and limit our query to. Defaults to
201212
.- store
The store number we want to look at. Defaults to
728
.- grp
The name of the column we want to group by when doing the sum. Defaults to
category
.
The yrmo
and store
parameters are integers, and all the rest are
strings. In this case, since we are using the parameters for the names of
tables and columns and such, and not as string constants, it turns out that
we want quoted=False
for all of them (though we don’t have to specify
it for basetable
, since that’s a special case.)
The XML generated looks like this:
<macro>
<library>
<block name="block" basetable="retaildemo.retail.sales_detail" prodtable="retaildemo.retail.products" linkcol="sku" datecol="date" storecol="store" col2sum="xsales" yrmo="201212" store="728" grp="category">
<base table="{@basetable}"/>
<link table2="{@prodtable}" col="{@linkcol}" type="exact"/>
<sel value="(((yrmo(({@datecol}))) = ({@yrmo})) & (({@storecol}) = ({@store})))"/>
<tabu breaks="{@grp}">
<tcol name="{@col2sum}_sum" source="{@col2sum}" fun="sum"/>
</tabu>
</block>
</library>
<call block="block" basetable="retaildemo.retail.sales_detail" prodtable="retaildemo.retail.products" linkcol="sku" datecol="date" storecol="store" col2sum="xsales" yrmo="201212" store="728" grp="category"/>
</macro>