Skip to main content

A pivot (or crosstab, or contingency) table aggregates sets of column values into rows of statistics, and pivots target value statistics on partitioning criteria defined by any available data.

-- ------------------------------------------------------------------------
-- Group column statistics in rows
-- http://www.artfulsoftware.com/infotree/qrytip.php?id=77
-- ------------------------------------------------------------------------

-- A pivot (or crosstab, or contingency) table aggregates sets of column
-- values into rows of statistics, and pivots target value statistics on
-- partitioning criteria defined by any available data.

-- Spreadsheet applications have intuitive point-and-click interfaces for
-- generating pivot tables. RDBMSs generally do not. The task looks difficult
-- in SQL, though, only until you have coded a few.

-- If you ported the Microsoft sample database Northwind to your MySQL database
-- (as described in chapter 11 of Get It Done with MySQL), you can execute this
-- example step by step. Even if you haven't ported Northwind, the example is
-- easy to follow.

-- Amongst the tables in the Northwind database are:

--  employees(employeeID, lastname, firstname, ...)
--  orders(orderID, customerId, employeeID, orderdate, ...)

-- There are nine employees, and 803 orders dated from 1996 through 1998.
-- Each order points to an employeeID. Suppose we wish to report counts of
-- orders taken by employees pivoted on year--how would we proceed?

-- We do the pivot table 3-step:

-- 1. Write the basic aggregating query, a GROUP BY query to aggregate the
-- data on desired variables (in this case, employee and year).

-- 2. Write the pivoting query as an outer query that creates a column for
-- each pivot value (year) from #1 written as an inner subquery.
-- (Sometimes it is more efficient to write the #results of #1 to a temp table
-- and write #2 to refer to the temp table.)

-- 3. Fix a ROLLUP display glitch by encapsulating #2 in a new outer query that
-- labels the ROLLUP row meaningfully.

-- Here are the three steps in more detail:

-- 1. Group the joined counts by the two criteria, employee and order year,
-- yielding one result row per employee per year:

SELECT
  CONCAT(firstname, ' ', lastname) AS 'Employee',
  YEAR (OrderDate) AS col,
  COUNT(*) AS DATA
FROM
  Employees e
JOIN Orders o ON e.EmployeeID = o.EmployeeID
GROUP BY
  e.employeeID,
  YEAR (o.OrderDate);

-- +------------------+------+------+
-- | Employee         | col  | Data |
-- +------------------+------+------+
-- | Nancy Davolio    | 1996 |   26 |
-- | Nancy Davolio    | 1997 |   55 |
-- | Nancy Davolio    | 1998 |   42 |
-- | Andrew Fuller    | 1996 |   16 |
-- | Andrew Fuller    | 1997 |   41 |
-- | Andrew Fuller    | 1998 |   39 |
-- | Janet Leverling  | 1996 |   18 |
-- | Janet Leverling  | 1997 |   71 |
-- | Janet Leverling  | 1998 |   38 |
-- | Margaret Peacock | 1996 |   31 |
-- | Margaret Peacock | 1997 |   81 |
-- | Margaret Peacock | 1998 |   44 |
-- | Steven Buchanan  | 1996 |   11 |
-- | Steven Buchanan  | 1997 |   18 |
-- | Steven Buchanan  | 1998 |   13 |
-- | Michael Suyama   | 1996 |   15 |
-- | Michael Suyama   | 1997 |   33 |
-- | Michael Suyama   | 1998 |   19 |
-- | Robert King      | 1996 |   11 |
-- | Robert King      | 1997 |   36 |
-- | Robert King      | 1998 |   25 |
-- | Laura Callahan   | 1996 |   19 |
-- | Laura Callahan   | 1997 |   54 |
-- | Laura Callahan   | 1998 |   31 |
-- | Anne Dodsworth   | 1996 |    5 |
-- | Anne Dodsworth   | 1997 |   19 |
-- | Anne Dodsworth   | 1998 |   19 |
-- +------------------+------+------+

-- Nine employees for three years yield 27 aggregated rows.

-- 2. We want one summary row per employee, and one count column for each year
-- when an employee took an order. We pivot the rows of the above resultset
-- on year by querying the above resultset, defining a colunn for every year
-- found, for example:

  SUM( CASE col WHEN '1996' THEN data ELSE 0 END ) AS '1996',

-- grouping the result by row WITH ROLLUP to provide a row of column
-- sums at the bottom. This gives the following query:

  SELECT
    Employee,
    SUM( CASE col WHEN '1996' THEN data ELSE 0 END ) AS '1996',
    SUM( CASE col WHEN '1997' THEN data ELSE 0 END ) AS '1997',
    SUM( CASE col WHEN '1998' THEN data ELSE 0 END ) AS '1998',
    SUM( data ) AS Total    -- sums across years by employee
  FROM (
    SELECT                  -- the query from step #1
      CONCAT(firstname,' ',lastname) AS 'Employee',
      YEAR(OrderDate) AS 'col',
      COUNT(*) AS Data
    FROM Employees e
    JOIN Orders o ON e.EmployeeID = o.EmployeeID
    GROUP BY e.employeeID, YEAR(o.OrderDate)
  ) AS stats
  GROUP BY employee WITH ROLLUP;

-- +------------------+------+------+------+-------+
-- | Employee         | 1996 | 1997 | 1998 | Total |
-- +------------------+------+------+------+-------+
-- | Andrew Fuller    |   16 |   41 |   39 |    96 |
-- | Anne Dodsworth   |    5 |   19 |   19 |    43 |
-- | Janet Leverling  |   18 |   71 |   38 |   127 |
-- | Laura Callahan   |   19 |   54 |   31 |   104 |
-- | Margaret Peacock |   31 |   81 |   44 |   156 |
-- | Michael Suyama   |   15 |   33 |   19 |    67 |
-- | Nancy Davolio    |   26 |   55 |   42 |   123 |
-- | Robert King      |   11 |   36 |   25 |    72 |
-- | Steven Buchanan  |   11 |   18 |   13 |    42 |
-- | NULL             |  152 |  408 |  270 |   830 |
-- +------------------+------+------+------+-------+

-- 3. The result of #2 is correct except that sums ought not to be reported as
-- NULL! We fix that bit of weirdness by writing query #2 as a derived table,
-- and having the new outer query alias the yearly sums row:

SELECT
  IFNULL( employee, 'SUMS') AS Employee, `1996`, `1997`, `1998` Total
FROM (
  SELECT
    Employee,
    SUM( CASE col WHEN '1996' THEN data ELSE 0 END ) AS '1996',
    SUM( CASE col WHEN '1997' THEN data ELSE 0 END ) AS '1997',
    SUM( CASE col WHEN '1998' THEN data ELSE 0 END ) AS '1998',
    SUM( data ) AS Total
  FROM (
    SELECT
      CONCAT(firstname,' ',lastname) AS 'Employee',
      YEAR(OrderDate) AS 'col',
      COUNT(*) AS Data
    FROM Employees e
    JOIN Orders o ON e.EmployeeID = o.EmployeeID
    GROUP BY e.employeeID, YEAR(o.OrderDate)
  ) AS stats
  GROUP BY employee WITH ROLLUP
) AS stats2;

-- +------------------+------+------+-------+
-- | Employee         | 1996 | 1997 | Total |
-- +------------------+------+------+-------+
-- | Andrew Fuller    |   16 |   41 |    39 |
-- | Anne Dodsworth   |    5 |   19 |    19 |
-- | Janet Leverling  |   18 |   71 |    38 |
-- | Laura Callahan   |   19 |   54 |    31 |
-- | Margaret Peacock |   31 |   81 |    44 |
-- | Michael Suyama   |   15 |   33 |    19 |
-- | Nancy Davolio    |   26 |   55 |    42 |
-- | Robert King      |   11 |   36 |    25 |
-- | Steven Buchanan  |   11 |   18 |    13 |
-- | SUMS             |  152 |  408 |   270 |
-- +------------------+------+------+-------+

-- (Thanks to Chris Gates for the correction.) With multiple statistics and pivot
-- layers, a pivot table query can get complex, but following this
-- 3-step will keep things clear.