A pivot (or crosstab, or contingency) table aggregates sets of column values into rows of statistics, and pivots target value statistics on partitioning criteria defined by any available data.
-- ------------------------------------------------------------------------
-- Group column statistics in rows
-- http://www.artfulsoftware.com/infotree/qrytip.php?id=77
-- ------------------------------------------------------------------------
-- A pivot (or crosstab, or contingency) table aggregates sets of column
-- values into rows of statistics, and pivots target value statistics on
-- partitioning criteria defined by any available data.
-- Spreadsheet applications have intuitive point-and-click interfaces for
-- generating pivot tables. RDBMSs generally do not. The task looks difficult
-- in SQL, though, only until you have coded a few.
-- If you ported the Microsoft sample database Northwind to your MySQL database
-- (as described in chapter 11 of Get It Done with MySQL), you can execute this
-- example step by step. Even if you haven't ported Northwind, the example is
-- easy to follow.
-- Amongst the tables in the Northwind database are:
-- employees(employeeID, lastname, firstname, ...)
-- orders(orderID, customerId, employeeID, orderdate, ...)
-- There are nine employees, and 803 orders dated from 1996 through 1998.
-- Each order points to an employeeID. Suppose we wish to report counts of
-- orders taken by employees pivoted on year--how would we proceed?
-- We do the pivot table 3-step:
-- 1. Write the basic aggregating query, a GROUP BY query to aggregate the
-- data on desired variables (in this case, employee and year).
-- 2. Write the pivoting query as an outer query that creates a column for
-- each pivot value (year) from #1 written as an inner subquery.
-- (Sometimes it is more efficient to write the #results of #1 to a temp table
-- and write #2 to refer to the temp table.)
-- 3. Fix a ROLLUP display glitch by encapsulating #2 in a new outer query that
-- labels the ROLLUP row meaningfully.
-- Here are the three steps in more detail:
-- 1. Group the joined counts by the two criteria, employee and order year,
-- yielding one result row per employee per year:
SELECT
CONCAT(firstname, ' ', lastname) AS 'Employee',
YEAR (OrderDate) AS col,
COUNT(*) AS DATA
FROM
Employees e
JOIN Orders o ON e.EmployeeID = o.EmployeeID
GROUP BY
e.employeeID,
YEAR (o.OrderDate);
-- +------------------+------+------+
-- | Employee | col | Data |
-- +------------------+------+------+
-- | Nancy Davolio | 1996 | 26 |
-- | Nancy Davolio | 1997 | 55 |
-- | Nancy Davolio | 1998 | 42 |
-- | Andrew Fuller | 1996 | 16 |
-- | Andrew Fuller | 1997 | 41 |
-- | Andrew Fuller | 1998 | 39 |
-- | Janet Leverling | 1996 | 18 |
-- | Janet Leverling | 1997 | 71 |
-- | Janet Leverling | 1998 | 38 |
-- | Margaret Peacock | 1996 | 31 |
-- | Margaret Peacock | 1997 | 81 |
-- | Margaret Peacock | 1998 | 44 |
-- | Steven Buchanan | 1996 | 11 |
-- | Steven Buchanan | 1997 | 18 |
-- | Steven Buchanan | 1998 | 13 |
-- | Michael Suyama | 1996 | 15 |
-- | Michael Suyama | 1997 | 33 |
-- | Michael Suyama | 1998 | 19 |
-- | Robert King | 1996 | 11 |
-- | Robert King | 1997 | 36 |
-- | Robert King | 1998 | 25 |
-- | Laura Callahan | 1996 | 19 |
-- | Laura Callahan | 1997 | 54 |
-- | Laura Callahan | 1998 | 31 |
-- | Anne Dodsworth | 1996 | 5 |
-- | Anne Dodsworth | 1997 | 19 |
-- | Anne Dodsworth | 1998 | 19 |
-- +------------------+------+------+
-- Nine employees for three years yield 27 aggregated rows.
-- 2. We want one summary row per employee, and one count column for each year
-- when an employee took an order. We pivot the rows of the above resultset
-- on year by querying the above resultset, defining a colunn for every year
-- found, for example:
SUM( CASE col WHEN '1996' THEN data ELSE 0 END ) AS '1996',
-- grouping the result by row WITH ROLLUP to provide a row of column
-- sums at the bottom. This gives the following query:
SELECT
Employee,
SUM( CASE col WHEN '1996' THEN data ELSE 0 END ) AS '1996',
SUM( CASE col WHEN '1997' THEN data ELSE 0 END ) AS '1997',
SUM( CASE col WHEN '1998' THEN data ELSE 0 END ) AS '1998',
SUM( data ) AS Total -- sums across years by employee
FROM (
SELECT -- the query from step #1
CONCAT(firstname,' ',lastname) AS 'Employee',
YEAR(OrderDate) AS 'col',
COUNT(*) AS Data
FROM Employees e
JOIN Orders o ON e.EmployeeID = o.EmployeeID
GROUP BY e.employeeID, YEAR(o.OrderDate)
) AS stats
GROUP BY employee WITH ROLLUP;
-- +------------------+------+------+------+-------+
-- | Employee | 1996 | 1997 | 1998 | Total |
-- +------------------+------+------+------+-------+
-- | Andrew Fuller | 16 | 41 | 39 | 96 |
-- | Anne Dodsworth | 5 | 19 | 19 | 43 |
-- | Janet Leverling | 18 | 71 | 38 | 127 |
-- | Laura Callahan | 19 | 54 | 31 | 104 |
-- | Margaret Peacock | 31 | 81 | 44 | 156 |
-- | Michael Suyama | 15 | 33 | 19 | 67 |
-- | Nancy Davolio | 26 | 55 | 42 | 123 |
-- | Robert King | 11 | 36 | 25 | 72 |
-- | Steven Buchanan | 11 | 18 | 13 | 42 |
-- | NULL | 152 | 408 | 270 | 830 |
-- +------------------+------+------+------+-------+
-- 3. The result of #2 is correct except that sums ought not to be reported as
-- NULL! We fix that bit of weirdness by writing query #2 as a derived table,
-- and having the new outer query alias the yearly sums row:
SELECT
IFNULL( employee, 'SUMS') AS Employee, `1996`, `1997`, `1998` Total
FROM (
SELECT
Employee,
SUM( CASE col WHEN '1996' THEN data ELSE 0 END ) AS '1996',
SUM( CASE col WHEN '1997' THEN data ELSE 0 END ) AS '1997',
SUM( CASE col WHEN '1998' THEN data ELSE 0 END ) AS '1998',
SUM( data ) AS Total
FROM (
SELECT
CONCAT(firstname,' ',lastname) AS 'Employee',
YEAR(OrderDate) AS 'col',
COUNT(*) AS Data
FROM Employees e
JOIN Orders o ON e.EmployeeID = o.EmployeeID
GROUP BY e.employeeID, YEAR(o.OrderDate)
) AS stats
GROUP BY employee WITH ROLLUP
) AS stats2;
-- +------------------+------+------+-------+
-- | Employee | 1996 | 1997 | Total |
-- +------------------+------+------+-------+
-- | Andrew Fuller | 16 | 41 | 39 |
-- | Anne Dodsworth | 5 | 19 | 19 |
-- | Janet Leverling | 18 | 71 | 38 |
-- | Laura Callahan | 19 | 54 | 31 |
-- | Margaret Peacock | 31 | 81 | 44 |
-- | Michael Suyama | 15 | 33 | 19 |
-- | Nancy Davolio | 26 | 55 | 42 |
-- | Robert King | 11 | 36 | 25 |
-- | Steven Buchanan | 11 | 18 | 13 |
-- | SUMS | 152 | 408 | 270 |
-- +------------------+------+------+-------+
-- (Thanks to Chris Gates for the correction.) With multiple statistics and pivot
-- layers, a pivot table query can get complex, but following this
-- 3-step will keep things clear.