Select rows

Selecting rows is one of the most basic tasks in the 1010data Insights Platform. Use row selection to isolate data that is important to your analysis and create more efficient queries.

Often, the first step to answering an analytical question is to narrow down the data you are working with so you see only the pieces of information that are relevant to your analysis. While there are many ways to do this, the most basic is a row selection.

The purpose of row selection is to reduce a large collection of general data to a smaller collection of data specific to the question being asked. For example, a question might be "Which store in our chain had the highest sales in December?" Another could be "What were the highest and lowest sales figures for a single transaction?" In both cases, you start with all of your data to answer a question that only requires a subset of that data.

When you work with very large data sets like those typically found in the Insights Platform, the order in which you perform your operations on the data makes a difference in how fast the system can complete those operations. If you make the largest selection that eliminates the most data first, then the subsequent operations you perform finish faster because the platform does not need to work with as much information. As a best practice, if your data contains columns with date- or time-related information, that is generally the best place to start. However, the largest selection may be something based on something else entirely depending on your data and the analysis you want to perform.

This tutorial explains how to perform a series of row selections in the Trillion-Row Spreadsheet (TRS) to reduce the amount of information in the Sales Item Detail data set.

To select rows:

  1. From the menu bar, select Tools > Trillion-Row Spreadsheet.
    The Insights Platform displays the TRS window.

  2. Open the Sales Item Detail table (pub.demo.retail.item).
    The name of the Sales Item Detail table is pub.demo.retail.item. The location of the table is reflected in the name. In this case, the table is located in the following folder: Published Data > Demo > Retail.
    TRS displays the New operation panel in the analysis pane on the left and the Sales Item Detail table in the results pane on the right.

    The table contains 35 rows of data, of which only a limited number display in the image below.

    For the purposes of this tutorial, a smaller data set was created so you can see the changes being made to the data. Even though performing real analyses require much larger data sets, all the principles in this tutorial apply.

  3. In the New operation panel, click Select.
    TRS displays the Simple Comparisons options in the Select rows panel.

  4. Clear the Simple Comparisons option.
    TRS displays the expression field.

When selecting rows in the Insights Platform, you should always make the selection that eliminates the largest amount of data first. This almost always means your first selection should be a date range. To do this, enter an expressing that uses the between function to limit the data from May 15, 2012 through May 17, 2012. An expression is a calculation, similar to a formula in Excel, you create to help you analyze your data.

  1. In the expression field, enter the following: between(date;

    If you click outside the expression field before entering a valid expression, you'll notice that the expression field becomes outlined in red. This indicates that the expression is not valid as entered. In this case, it is not valid because it is not yet complete.

Now that the between function for the date column is entered, you will specify the beginning and ending dates. Dates are entered in the YYYYMMDD format.

Note: A row selection specified using the between function is inclusive, meaning rows with the values entered in the corresponding fields are included in the results.

  1. In the expression field, after between(date;, enter the following: 20120515;20120517)
    The expression field should contain the following: between(date;20120515;20120517)

    Notice that the expression field is now outlined in green. This indicates that the completed expression is valid.

  2. Press Enter.
    TRS displays the results of your selection.

    Congratulations, you just performed your first row selection in the Insights Portal! As you can see in the image above, the amount of data in the table is reduced by more than half. Also, notice in the analysis timeline, the row selection you just performed displays in step 2 of the timeline.

What if you want to further limit your data to a particular store? To do this, create another row selection and enter additional comparison criteria. In this case, you can enter a store number to limit the data to the rows that identify a specific store.

  1. In the New operation panel, click Select.
    TRS displays the Simple Comparisons options in the Select rows panel.

  2. In the Select rows panel, do the following:
    1. From the first drop-down list, select Store.
    2. From the second drop-down list, select the equal sign (=) character.
    3. In the last field, enter 2.
  3. Do one of the following:
    • Press Enter.
    • Click anywhere in the panel, outside of the field.

    These selections, outlined in red above, indicate that you want to see only rows for store 2. If the second comparison option, the not equal to () character, is selected, all of the stores in the worksheet except store 2 are displayed.

    If you want to include more than one value (more than a single store in the example above), add additional criteria by clicking the Add Value () icon and then enter the additional values.

    The example above displays results for both store 1 and store 3. Searching for multiple values is useful when working with information like product numbers.

    Now that all of the information you need for your results is entered, you can perform the selection.

  4. Click the Submit operation () icon.
    TRS displays the results of your selection.

    In just a few minutes, you narrowed the data down from 35 rows to 5. That is less than one-seventh of the original data. Also, notice that the timeline indicates the selections you already performed (outlined in red below). This can help prevent duplicating effort when making additional selections.

You can also quickly perform a basic row selection directly from the worksheet by right-clicking any cell and choosing an option that appears in the pop-up menu.

  1. Right-click the cell in the first row of the Date column and point to Select rows.
    TRS displays a menu of selection options.

By right-clicking a cell in the worksheet, you can choose from several row selection options that pertain to that cell. Take a look at the available options when you right-click an Account cell.

  1. Right-click the cell in the first row of the Account column and point to Select rows.
    TRS displays a menu of selection options.

    As you can see, when you right-click a cell in the Account column, the options are related to Account and not Date. If you need to perform a simple selection, this is a great way to quickly select rows.

  2. When you are finished with this tutorial, close the TRS window.
Row selection is an incredibly powerful tool you can use when working with your data. Using this simple operation, you can eliminate unnecessary data and focus on what's pertinent. Feel free to make up a few row selections of your own and try them out on any of the tables in the Published > Demo folder. It will make a big difference as you learn how to work with the TRS, and ultimately, make your analyses faster and more accurate.

For more information about selecting rows, see Selecting Rows in the 1010data Insights Platform User's Guide.