Summarizations and tabulations
Some of the basic ways to analyze your data include quick summaries, tabulations, and cross tabulations.
Oftentimes, you may want to determine the total of all the values in a particular column or the average of those values. Or perhaps you want to find the highest or lowest value in a column. A quick summary allows you to perform calculations, called summarizations, on one or more columns in a 1010data Insights Platform table to ascertain these quantities very easily.
Using tabulations, you can summarize the values in a column (or columns) based on the values in another column (or columns) and display those results in a tabular format.
For example, suppose a table has a row for every employee and columns Name, Title, Department, and Salary. Using a tabulation, you can compute the number of people and average salary for each department. The result may look something like the example below:
Department | Count | Average Salary |
---|---|---|
228 | 73,946 | |
Finance | 35 | 61,778 |
Marketing | 41 | 87,562 |
Sales | 67 | 92,962 |
Engineering | 34 | 69,276 |
Technology | 51 | 49,481 |
The cells in the top row beneath the column headings are column aggregates. In this example, the total number of employees (which corresponds to the number of rows in the original table) is 228, and the average salary across all departments is 73,946.
A cross tabulation allows you to summarize the values in a column based on the values in two or more other columns and display the result as a matrix.
In the above example, you could use a cross tabulation to compute the average salary in each department for each title. This would result in a table with one row for each department and one column for each title:
Title | VP | AVP | Manager | Grunt | |
---|---|---|---|---|---|
Department | 73,946 | 143,356 | 118,979 | 85,345 | 85,345 |
Finance | 61,778 | 95,774 | 85,812 | 70,553 | 48,944 |
Marketing | 87,562 | 127,344 | 127,664 | 100,359 | 70,686 |
Sales | 92,962 | 155,863 | 125,268 | 100,945 | 74,314 |
Engineering | 69,276 | 116,181 | 94,189 | 78,594 | 53,997 |
Technology | 49,481 | 79,178 | 67,080 | 56,285 | 39,657 |
In this example, you can see that the average salary for AVPs in Engineering is 94,189. The cells in the first row beneath the column headings and the first column next to the row headings are aggregates. The first row of data contains the average salaries across all departments for each title (e.g., the average salary for all VPs is 143,356). Similarly, the first column of data contains the average salaries across all titles for each department (e.g., the average salary for all people in the Finance department is 61,778). The top left-most cell of data is the average for all employees across all departments and titles, which is 73,946.
Certain types of summarizations require a second column to perform the computation. For instance, a weighted average requires two columns: the column on which you are calculating the average and the column that contains the weight. Other examples of summarizations that involve two columns of data include the correlation, dot product, or covariance of two columns.
Missing values in the data are handled automatically in the Insights Platform. For example, if you are computing the average value of a column that has some missing (N/A) values, those values are ignored when computing the average.
Quick summaries, tabulations, and cross tabulations can be performed easily using the web
interface. In addition, the Macro Language XML code functions (in particular, G_Functions)
provide a comprehensive set of tools for performing similar and more advanced operations on data
in the Insights Platform. For instance, you can obtain the same information from the tabulation
in the above example using a very simple G_Function. To find the average salary for each
department, you could create a computed column and use the g_avg
function as its
value expression. The results of the cross tabulation could be determined in a similar way.