Summarizations and Tabulations

Some of the basic ways to analyze your data using 1010data include quick summaries, tabulations, and cross tabulations.

Oftentimes, you may want to determine the total of all the values in a particular column or the average of those values. Or perhaps you want to find the highest or lowest value in a column. A quick summary allows you to perform calculations on one or more columns in a 1010data table to ascertain these quantities very easily.

Using tabulations, you can summarize the values in a column (or columns) based on the values in another column (or columns) and display those results in a tabular format.

For example, suppose a table has a row for every employee and columns Name, TitleDepartment, and Salary. Using a tabulation, we can compute the number of people and average salary for each department. The result will look something like:

Department Count Average Salary
228 73,946
Finance 35 61,778
Marketing 41 87,562
Sales 67 92,962
Engineering 34 69,276
Technology 51 49,481

The cells in the top row beneath the column headings are column aggregates. In this example, the total number of employees (which corresponds to the number of rows in the original table) is 228, and the average salary across all departments is 73,946.

A cross tabulation allows you to summarize the values in a column based on the values in two or more other columns and display the result as a matrix.

In our above example, we could use a cross tabulation to compute the average salary in each department for each title. This would result in a table with one row for each department and one column for each title:

Title VP AVP Manager Grunt
Department 73,946 143,356 118,979 85,345 85,345
Finance 61,778 95,774 85,812 70,553 48,944
Marketing 87,562 127,344 127,664 100,359 70,686
Sales 92,962 155,863 125,268 100,945 74,314
Engineering 69,276 116,181 94,189 78,594 53,997
Technology 49,481 79,178 67,080 56,285 39,657

In this example, we can see that the average salary for AVPs in Engineering is 94,189. The cells in the first row beneath the column headings and the first column next to the row headings are aggregates. The first row of data contains the average salaries across all departments for each title (e.g., the average salary for all VPs is 143,356). Similarly, the first column of data contains the average salaries across all titles for each department (e.g., the average salary for all people in the Finance department is 61,778). The top left-most cell of data is the average for all employees across all departments and titles, which is 73,946.

Certain types of summarizations require a second column to perform the computation. For instance, a weighted average requires two columns: the column on which you are calculating the average and the column that contains the weight. Other examples of summarizations that involve two columns of data include the correlation, dot product, or covariance of two columns.

Missing values in the data are handled automatically in 1010data. For example, if you are computing the average value of a column that has some missing (N/A) values, those values are ignored when computing the average.

Quick summaries, tabulations, and cross tabulations can be performed easily using the 1010data web interface. In addition, the 1010data Macro Language functions (in particular, G_Functions) provide a comprehensive set of tools for performing similar and more advanced operations on data in 1010data. For instance, you can obtain the same information from the tabulation in the above example using a very simple G_Function: To find out the average salary for each department, you could create a computed column and use the g_avg function as its value expression. The results of the cross tabulation could be determined in a similar way. And this is just the tip of the iceberg when it comes to G_Functions. The potential for data discovery using these powerful tools is limitless.