Skip to main content

Statistically, the median is the middle value, the value that is smaller than that found in half of all remaining rows, and larger than that found in the other half.

-- ---------------------------------------------------------------------------
-- Rank order

-- from the Artful Common Queries page
-- http://www.artfulsoftware.com/infotree/qrytip.php?id=444
-- ---------------------------------------------------------------------------

-- Statistically, the median is the middle value, the value that is smaller
-- than that found in half of all remaining rows, and larger than that found
-- in the other half:

  SELECT
    a.hours AS Median
  FROM
    BulbLife AS a,
    bulbLife AS b
  GROUP BY
    a.Hours
  HAVING
    Sum(Sign(1 - Sign(b.Hours - a.Hours))) = Floor((Count(*) + 1) / 2);

-- This formula works, but it doesn't scale—it's O(n*n)—and it's awkward to use.

-- So we posted a MySQL implementation of Torben Mogenson's algorithm for
-- calculating the median (http://ndevilla.free.fr/median/median/node20.html).
-- It's said to be amongst the fastest, but it proved too slow.

-- Then Joe Wynne offered an algorithm which looks correct and does scale.
-- Here it is as a MySQL stored procedure:

DROP PROCEDURE IF EXISTS Median;
DELIMITER |
CREATE PROCEDURE Median( tbl CHAR(64), col CHAR(64), OUT res DOUBLE )
BEGIN
  DECLARE arg CHAR(64);
  SET @sql = CONCAT( 'SELECT ((COUNT(*))/2) INTO @c FROM ', tbl );
  PREPARE stmt FROM @sql;
  EXECUTE stmt;
  DROP PREPARE stmt;
  SET @a = CONVERT(FLOOR(@c), SIGNED);
  IF @a = @c THEN
    BEGIN
      SET @a = @a-1;
      SET @b = 2;
      SET arg = CONCAT( 'AVG(', col, ')' );
    END;
  ELSE
    BEGIN
      SET @b = 1;
      SET arg = col;
    END;
  END IF;
  SET @sql = CONCAT('SELECT ', arg, ' INTO @res FROM (SELECT ', col, ' FROM ', tbl,
                    ' ORDER BY ', col, ' LIMIT ?,?) as tmp');
  PREPARE stmt FROM @sql;
  EXECUTE stmt USING @a, @b;
  DROP PREPARE stmt;
  SET res=@res;
END;
|
DELIMITER ;

-- What we really need is a perfectly general function that scales and can
-- be dropped into queries. As it turns out, we can have two of those three
-- requirements—scalability and drop-in-ability. We can't have generality
-- because MySQL functions can't invoke PREPARE. Still, two of three
-- ain't bad. Stay tuned.