Statistically, the median is the middle value, the value that is smaller than that found in half of all remaining rows, and larger than that found in the other half.
-- ---------------------------------------------------------------------------
-- Rank order
-- from the Artful Common Queries page
-- http://www.artfulsoftware.com/infotree/qrytip.php?id=444
-- ---------------------------------------------------------------------------
-- Statistically, the median is the middle value, the value that is smaller
-- than that found in half of all remaining rows, and larger than that found
-- in the other half:
SELECT
a.hours AS Median
FROM
BulbLife AS a,
bulbLife AS b
GROUP BY
a.Hours
HAVING
Sum(Sign(1 - Sign(b.Hours - a.Hours))) = Floor((Count(*) + 1) / 2);
-- This formula works, but it doesn't scale—it's O(n*n)—and it's awkward to use.
-- So we posted a MySQL implementation of Torben Mogenson's algorithm for
-- calculating the median (http://ndevilla.free.fr/median/median/node20.html).
-- It's said to be amongst the fastest, but it proved too slow.
-- Then Joe Wynne offered an algorithm which looks correct and does scale.
-- Here it is as a MySQL stored procedure:
DROP PROCEDURE IF EXISTS Median;
DELIMITER |
CREATE PROCEDURE Median( tbl CHAR(64), col CHAR(64), OUT res DOUBLE )
BEGIN
DECLARE arg CHAR(64);
SET @sql = CONCAT( 'SELECT ((COUNT(*))/2) INTO @c FROM ', tbl );
PREPARE stmt FROM @sql;
EXECUTE stmt;
DROP PREPARE stmt;
SET @a = CONVERT(FLOOR(@c), SIGNED);
IF @a = @c THEN
BEGIN
SET @a = @a-1;
SET @b = 2;
SET arg = CONCAT( 'AVG(', col, ')' );
END;
ELSE
BEGIN
SET @b = 1;
SET arg = col;
END;
END IF;
SET @sql = CONCAT('SELECT ', arg, ' INTO @res FROM (SELECT ', col, ' FROM ', tbl,
' ORDER BY ', col, ' LIMIT ?,?) as tmp');
PREPARE stmt FROM @sql;
EXECUTE stmt USING @a, @b;
DROP PREPARE stmt;
SET res=@res;
END;
|
DELIMITER ;
-- What we really need is a perfectly general function that scales and can
-- be dropped into queries. As it turns out, we can have two of those three
-- requirements—scalability and drop-in-ability. We can't have generality
-- because MySQL functions can't invoke PREPARE. Still, two of three
-- ain't bad. Stay tuned.