<find>

Efficiently finds (without selecting) the first n rows, between the specified row numbers, where the specified Boolean value expression is true.

Description

This operation identifies the first n rows in a table, based upon a Boolean expression, constrained by an upper and lower bound. The upper bound and lower bound default to 0 if omitted and are not themselves included. Because the search is carried out on only as many segments required to satisfy the request, <find> has the potential to be much faster than <sel>. This may also result in less memory needed to cache a <find> operation than a <sel> operation.

If the search condition used is a commonly occurring word or phrase, and the table is large and widely distributed, this can result in high parallelism which incurs the overhead of many subprocesses needing to coordinate and report back. The attribute maxpar="n" specifies the maximum number of subprocesses to be enlisted to search for the condition.

Syntax

<find value="[EXPRESSION]" after="[AFTER_ROW_NUM]" before="[BEFORE_ROW_NUM]"
        count="[N]">

Alternate syntax

(Available as of 12.24)

An alternate syntax of <find> (containing a tree of <or> tags) is also supported. In this form, the optional attribute spflag="1" is available which indicates whether special values should be treated as N/As when evaluating the value expression or executing multiple successive <find> statements, each of which caches its results.

Shown here, this syntax allows for finding a disjunction of multiple Boolean expressions by specifying multiple <or> child elements.

<find>
  <or value="[EXPR_1]"/>
  <or value="[EXPR_2]"/>
  ...
</find>

This construction finds the rows for which any of [EXPR_1], [EXPR_2], etc. are true.

This is logically equivalent to <find value="([EXPR_1])|([EXPR_2])|..."/>, but it may be more convenient for disjunctive criteria, especially those constructed by block code. In addition, using this syntax may provide a memory or efficiency improvement compared to evaluating one long expression or executing multiple successive <find> statements, each of which caches its results.

Including a value="[EXPR_0]" as an attribute of the <find> tag itself is permitted; it will be treated as an additional <or> clause.

<find value="[EXPR_0]">
  <or value="[EXPR_1]"/>
  <or value="[EXPR_2]"/>
  ...
</find>

Each <or> tag may contain one or more <or> tags. In this way, complex searches may be built up from multiple smaller expressions.

Note: All tags within <find> must either be <and> or <or>; they may not be mixed.

Alternate syntax

(Available as of 12.24)

An alternate syntax of <find> (containing a tree of <and> tags) is also supported. In this form, the optional attribute spflag="1" is available which indicates whether special values should be treated as N/As when evaluating the value expression or executing multiple successive <find> statements, each of which caches its results.

Shown here, this syntax allows for finding a disjunction of multiple Boolean expressions by specifying multiple <and> child elements.

<find>
  <and value="[EXPR_1]"/>
  <and value="[EXPR_2]"/>
  ...
</find>

This construction selects the rows for which all of [EXPR_1], [EXPR_2], etc. are true.

This is logically equivalent to <find value="([EXPR_1])&([EXPR_2])&..."/>, but it may be more convenient for conjunctive criteria, especially those constructed by block code. In addition, using this syntax may provide a memory or efficiency improvement compared to evaluating one long expression or executing multiple successive <find> statements, each of which caches its results.

Including a value="[EXPR_0]" as an attribute of the <find> tag itself is permitted; it will be treated as an additional <and> clause.

<find value="[EXPR_0]">
  <and value="[EXPR_1]"/>
  <and value="[EXPR_2]"/>
  ...
</find>

Each <and> tag may contain one or more <and> tags. In this way, complex selections may be built up from multiple smaller expressions.

Note: All tags within <find> must either be <and> or <or>; they may not be mixed.

Attributes

value
Specifies an expression that determines which rows are searched for. The expression specified by the value attribute must evaluate to either 0 or 1.

The row numbers in which the Boolean expression evaluates to 1 are returned; rows in which the Boolean expression evaluates to 0 are ignored.

after
Indicates the starting point from which <find> searches for matches. The row number specified here is excluded from the search set.

Defaults to 0 (i.e. just before the first row, row one) if omitted, so that the first row will be found if the expression is satisfied there.

before
Indicates the stopping point before which <find> searches for matches. The row number specified here is excluded from the search set.

Defaults to a number 1-greater than the number of rows in the table or worksheet on which the operation is applied. This ensures that all rows are included in the search space.

count
Specifies the number of rows to find within the given range of rows <find> is operating on.

[N] defaults to 1, and a negative [N] may be specified, in which case the search proceeds backwards from [BEFORE_ROW_NUM] (or the end of the table) and the results are returned in descending order.

maxpar

Accepts a positive integer indicating the maximum number of subprocesses that will be put to work once searching for the condition.

The default is no maximum, whereas maxpar="1" ensures that segments are searched sequentially, one at a time. This is theoretically fastest if the requested number of results is guaranteed to be found in the first segment.

maxpar="auto" can be used to mitigate this; it starts maximum parallelism at 1 and then doubles it at each step when the requested row count is not yet satisfied. This is very effective in minimizing overhead for frequent conditions while maximizing parallelism for rare conditions.

softcount
Specifies the number of rows [M] (larger than [N]) to return if they are found while performing the search.

Used in conjunction with count, or if count is omitted with the default [N]=1, [M] should be a positive number greater than the absolute value of [N]. Searches only as far as necessary to find [N] rows, but return a larger number [M] of rows if they are found in the course of that search.

Example

In pub.demo.weather.stations, find the first ten occurrences where state = 'NY'.

The result is a tabulation-like table of a single column row containing row numbers, with as many rows as there were search results (i.e. between 0 and 10 results, inclusive).

<base table="pub.demo.weather.stations"/>
<find value="state='NY'" count="10"/>
Find - Example 1 Result

Example: Find the last three rows

In pub.demo.weather.stations, find the last three rows before row 100 where state='GA'.

<base table="pub.demo.weather.stations"/>
<find count="-3" before="100" value="state='GA'"/>

Example 2 Image

Example: Alternate syntax

In pub.demo.retail.item, find the first 15 rows matching any of three different values from the account column.

<base table="pub.demo.retail.item"/>
<find count="15">
  <or value="account=957"/>
  <or value="account=478"/>
  <or value="account=709"/>
</find>
Alternate Syntax Results Example