Selecting rows is one of the most basic tasks in the 1010data Insights Platform. Use row selection to isolate data that is important to your analysis and create more efficient queries.
Often, the first step to answering an analytical question is to narrow down the data you are working with so you see only the pieces of information that are relevant to your analysis. While there are many ways to do this, the most basic is a row selection.
The purpose of row selection is to reduce a large collection of general data to a smaller collection of data specific to the question being asked. For example, a question might be "Which store in our chain had the highest sales in December?" Another could be "What were the highest and lowest sales figures for a single transaction?" In both cases, you start with all of your data to answer a question that only requires a subset of that data.
When you work with very large data sets like those typically found in the Insights Platform, the order in which you perform your operations on the data makes a difference in how fast the system can complete those operations. If you make the largest selection that eliminates the most data first, then the subsequent operations you perform finish faster because the platform does not need to work with as much information. As a best practice, if your data contains columns with date- or time-related information, that is generally the best place to start. However, the largest selection may be something based on something else entirely depending on your data and the analysis you want to perform.
This tutorial explains how to perform a series of row selections in the Trillion-Row Spreadsheet (TRS) to reduce the amount of information in the Sales Item Detail data set.
To select rows:
The table contains 35 rows of data, of which only a limited number display in the image below.
For the purposes of this tutorial, a smaller data set was created so you can see the changes being made to the data. Even though performing real analyses require much larger data sets, all the principles in this tutorial apply.
When selecting rows in the Insights Platform, you should always make the selection that eliminates the largest amount of data first. This almost always means your first selection should be a date range. To do this, enter an expressing that uses the between function to limit the data from May 15, 2012 through May 17, 2012. An expression is a calculation, similar to a formula in Excel, you create to help you analyze your data.
If you click outside the expression field before entering a valid expression, you'll notice that the expression field becomes outlined in red. This indicates that the expression is not valid as entered. In this case, it is not valid because it is not yet complete.
Now that the between function for the date column is entered, you will specify the beginning and ending dates. Dates are entered in the YYYYMMDD format.
Notice that the expression field is now outlined in green. This indicates that the completed expression is valid.
Congratulations, you just performed your first row selection in the Insights Portal! As you can see in the image above, the amount of data in the table is reduced by more than half. Also, notice in the analysis timeline, the row selection you just performed displays in step 2 of the timeline.
What if you want to further limit your data to a particular store? To do this, create another row selection and enter additional comparison criteria. In this case, you can enter a store number to limit the data to the rows that identify a specific store.
These selections, outlined in red above, indicate that you want to see only rows for store 2. If the second comparison option, the not equal to (≠) character, is selected, all of the stores in the worksheet except store 2 are displayed.
If you want to include more than one value (more than a single store in the example above), add additional criteria by clicking the Add Value () icon and then enter the additional values.
The example above displays results for both store 1 and store 3. Searching for multiple values is useful when working with information like product numbers.
Now that all of the information you need for your results is entered, you can perform the selection.
In just a few minutes, you narrowed the data down from 35 rows to 5. That is less than one-seventh of the original data. Also, notice that the timeline indicates the selections you already performed (outlined in red below). This can help prevent duplicating effort when making additional selections.
You can also quickly perform a basic row selection directly from the worksheet by right-clicking any cell and choosing an option that appears in the pop-up menu.
By right-clicking a cell in the worksheet, you can choose from several row selection options that pertain to that cell. Take a look at the available options when you right-click an Account cell.
As you can see, when you right-click a cell in the Account column, the options are related to Account and not Date. If you need to perform a simple selection, this is a great way to quickly select rows.
For more information about selecting rows, see Selecting Rows in the 1010data Insights Platform User's Guide.