<sel>

Select the rows relevant to your analysis.

Description

<sel> is applied to a table or worksheet of some arbitrary number of rows and produces a worksheet containing only those rows that were not eliminated by the value expression. The expression passed to the value attribute must evaluate to either 0 or 1. For instance, the expression:

sales_price>5.00

where sales_price is a column in a table or worksheet, will be evaluated for each row in the table. If the value of sales_price for that row is greater than 5.00, the expression evaluates to 1 and the row is selected. If the value of sales_price is less than or equal to 5.00, the expression evaluates to 0 and the row is eliminated from the worksheet (but not the physical table).

Expressions without comparisons can also be used for selecting rows. For example, column foo is column containing a 0 or 1, and column bar is a column containing 0 or 1. An expression can be written that will only select the rows where both columns contain a 1, as follows:

<sel value="foo*bar"/>

The selection expression above will only select rows that have the value 1 for both foo and bar.

Syntax

<sel value="[EXPR]"/>

This construction selects the rows for which [EXPR] is true.

Alternate syntax

(Available as of 11.06)

The alternate <sel> syntax allows for selecting a disjunction of multiple boolean expressions by specifying multiple <and> and <or> child elements, as follows:

<sel>
  <or value= [EXPR_1]/>
  <or>
  <and value="[EXPR_2]"/>
  <and value="[EXPR_3]"/>
  </or>
  ...
</sel>

The construction above selects the rows for which either [EXPR_1] or [EXPR_2] and [EXPR_3] are true.

Each <sel>, <and>, or <or> tag may contain zero or more <or> tags or zero or more <and> tags. While <and> and <or> may not be mixed at a given nesting level within a single tag, you may have a complex selection that includes both <and>s and <or>s. In this way, complex selections may be built up from multiple smaller expressions. See the example section.

This alternate syntax may be more convenient for disjunctive selections, especially those constructed by block code. In addition, using this syntax may provide a memory or efficiency improvement compared to evaluating one long expression.

Including a value="[EXPR_0]" as an attribute of the <sel> tag itself is permitted; it will be treated as an additional <or> or <and> clause.

An <and/> or <or/> tag with neither a value attribute nor contents has no effect.

Attributes

value
Accepts a Macro Language expression which is used to filter the current dataset.
spflag
Indicates whether special values should be treated as N/As when evaluating the value expression.

The default is 0 (i.e., do not treat special values as N/As).

expand
Expand the number of rows in a table based on the selection expression.

For <sel value="N" expand="1"/>, where N is a column containing integer values, or an expression (i.e., an expression involving two or more columns that results in a vector of integer values) , expand="1" will create a copy of each row according to the value of N.

For rows where N is 0, the row will still be eliminated.

If N is negative, an error will be returned.

The system variable ii_(B) can be used to provide an enumeration of duplicate rows for each original row after the expansion, where B specifies a starting base number for the enumeration. For more information and a list of other system variables, see System variables.

See Example: expand="1" below for additional detail.

sample
Accepts an integer value of 1 or 0 that specifies whether or not to generalize the selection based on the value of the value attribute.

When sample="1", a row will appear in the post-selection result with a probability equal to the value of the value attribute. The value attribute can be an expression (or number) that evaluates at each row to a float between 0 and 1 or may also be N/A, which is equivalent to 0.

For instance, <sel sample="1" value="0.1"/> should select approximately 10% of the rows uniformly across the entire table.

The default is 0.

Note: The optional attribute seed can be specified to change the random seed for the sample selection.

(Available as of version 10.17)

seed
Accepts a decimal number that specifies the random seed for the sample selection.

By default, the seed is based on the query hash so that the identical query will deterministically create the same sample.

If seed="0", a seed is non-deterministically generated when the query is run. Because of caching, the sample won't change for the current session unless the cache is cleared, but runs of the same query with seed="0" in different sessions will in general take a different sample.

(Available as of version 10.17)

Example

In the following example, a selection is performed on pub.demo.retail.item to select those rows where values in the store column are equal to 1.

<base table="pub.demo.retail.item"/>
<sel value="store=1"/>

Example: expand="1"

In this example, the value in the units column is used to determine the factor of row expansion for each row in the table pub.demo.retail.item. Therefore, a row where units=3 will be expanded to three rows, a row where units=2 will be expanded to two rows, etc. Before the expansion, a selection is performed to make sure that the units column does not contain negative values. A computed column that references the system variable ii_() displays the enumeration of duplicate rows for each original row after the expansion.

<base table="pub.demo.retail.item"/>
<sel value="units>=0"/>
<sel value="units" expand="1"/>
<willbe name="enum_expand" value="ii_()"/>

Example: Escape a string used in a selection expression

In the following example, the qv(X) function is used to properly escape a string used in a selection expression. To search for items in the description column of pub.doc.retail.product that contain the string 9", the qv(X) function must be used to properly escape it.

<base table="pub.doc.retail.product"/>
<let search_string="9&quot;">
  <sel value="(contains(description;{qv(@search_string)}))"/>
</let>

If the qv(X) function was not specified, the query would have resulted in an error stating that the " was a bad character.

Example: Create a string of quoted values from a comma-separated list

The following example illustrates how to use the str_to_lst(S;D) and qv(X) functions to turn a comma-separated list into a string of quoted values that can be used in a selection expression.

<base table="pub.doc.retail.store"/>
<set divs="West,North"/>
<let divs_quoted="{qv(str_to_lst(@divs;','))}">
  <sel value="divisiondesc={@divs_quoted}"/>
</let>

The variable divs is a comma-separated string consisting of two elements: West and North. The string is first turned into a list-value using the function str_to_lst(S;D), which is then passed to the qv(X) function. The qv(X) function converts a list-value into a string of quoted values, which can be used in a <sel> operation. The string of quoted values is assigned to the variable divs_quoted, which is declared in a <let> statement. Inside the <let>, the <sel> operation then references the value of the divs_quoted variable.

The code, after expansion, is:

<base table="pub.doc.retail.store"/>
<sel value="divisiondesc='West''North'"/>

The result is shown below:

Example: Selecting a sample of rows

The following example demonstrates how to select half of an ordered set of rows with a recency bias. Because the <sel> operation specifies seed="0", the query may result in a different sample each time it is run either after clearing the cache or in a new session.

<base table="pub.demo.retail.item"/>
<willbe name="rownum" value="rn_"/>
<willbe name="prob" value="i_/n_"/>
<sel value="i_/n_" sample="1" seed="0"/>

Example: Select a disjunction of multiple boolean expressions using <or>

The following example uses the alternate syntax for <sel> to select a disjunction of multiple boolean expressions using <or> elements. (Available as of 11.06)

The query selects rows for which the value expression of any <or> tag is true.

<base table="pub.demo.retail.item"/>
<sel>
  <or value="store=1"/>
  <or value="date=20120515"/>
</sel>

Example: Select a conjunction of multiple boolean expressions using <and>

The following example uses the alternate syntax for <sel> to select a conjunction of multiple boolean expressions using <and> elements. (Available as of 11.06)

The query selects rows for which the value expressions of all <and> tags are true.

<base table="pub.demo.retail.item"/>
<sel>
  <and value="store=1"/>
  <and value="date=20120515"/>
</sel>

Example: Mix <and> and <or> tags within <sel>

The following code shows how to nest <and> and <or> selections. In this example, the expression is included within the <sel> tag.

<base table="pub.demo.weather.stations"/>
<sel value="tz=-7">
<or value="state='NY'">
<and value="elev>100"/>
</or>
<or value="state='CA'">
<and value="elev<100"/>
</or>
<or value="state='TX'"/>
</sel>
The rows returned include tz=-7, state='NY' and elev>100, state='CA' and elev<100, or state='TX'.