有11家门店数据,要求每天所有门店的各个指标的中位数
1.第一种做法,使用PERCENTILE_CONT() 函数 SQL SERVER 2012 版本及以上
PERCENTILE_CONT 函数简介
PERCENTILE_CONT 是 SQL 中的窗口函数,用于计算连续百分位数(基于线性插值)。适用于需要获取数据分布中特定百分位值的场景,如统计工资的中位数或 90% 分位数。
语法结构
PERCENTILE_CONT(percentile) WITHIN GROUP (ORDER BY sort_expression)
OVER ([PARTITION BY partition_expression])
- percentile:介于 0 和 1 之间的数值,指定所需的百分位(如 0.5 表示中位数)。
- sort_expression:用于排序的列或表达式。
- PARTITION BY:可选,按指定列分组计算百分位数。
基础用法示例
计算整个表中工资列的中位数(50% 分位数):
SELECT PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY salary) AS median_salary
FROM employees;
分组计算示例
按部门分组计算工资的 75% 分位数:
SELECT department_id,PERCENTILE_CONT(0.75) WITHIN GROUP (ORDER BY salary) AS percentile_75
FROM employees
GROUP BY department_id;
窗口函数用法
结合 OVER 和 PARTITION BY 计算每个部门的工资中位数:
SELECT employee_id,department_id,salary,PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY salary) OVER (PARTITION BY department_id) AS dept_median_salary
FROM employees;
多百分位计算
单次查询中计算多个百分位值:
SELECT PERCENTILE_CONT(0.25) WITHIN GROUP (ORDER BY salary) AS percentile_25,PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY salary) AS percentile_50,PERCENTILE_CONT(0.75) WITHIN GROUP (ORDER BY salary) AS percentile_75
FROM employees;
注意事项
- PERCENTILE_CONT 在 Oracle、PostgreSQL 和 SQL Server 中支持,但 MySQL 需使用 PERCENTILE_CONT 的替代方案。
- 结果为浮点数,即使输入是整数类型。
- 空值(NULL)会被自动忽略。
- 与 PERCENTILE_DISC 的区别:PERCENTILE_CONT 返回插值结果,PERCENTILE_DISC 返回实际存在的值。
解决代码:
WITH test_basedata AS (-- 模拟11家门店的指标数据(sdate统一为20250814,覆盖不同指标值,方便验证中位数)SELECT '20250814' AS sdate, 'SH001' AS shopid, 0.25 AS profit_rate, 89.5 AS avg_customer_price, 2.1 AS avg_num_purchases, 0.85 AS salevalue_finishrate, 0.12 AS abysku_rate, 35 AS dsiUNION ALL SELECT '20250814', 'SH002', 0.32, 105.2, 1.8, 1.02, 0.08, 28UNION ALL SELECT '20250814', 'SH003', 0.18, 76.9, 2.5, 0.68, 0.15, 42UNION ALL SELECT '20250814', 'SH004', 0.41, 120.7, 1.5, 1.25, 0.05, 22UNION ALL SELECT '20250814', 'SH005', 0.29, 98.3, 2.3, 0.95, 0.10, 30UNION ALL SELECT '20250814', 'SH006', 0.35, 112.1, 1.9, 1.10, 0.07, 25UNION ALL SELECT '20250814', 'SH007', 0.22, 82.4, 2.4, 0.75, 0.13, 38UNION ALL SELECT '20250814', 'SH008', 0.38, 109.6, 1.7, 1.08, 0.06, 26UNION ALL SELECT '20250814', 'SH009', 0.27, 92.8, 2.2, 0.88, 0.11, 33UNION ALL SELECT '20250814', 'SH010', 0.31, 10.5, 2.0, 0.99, 0.09, 29UNION ALL SELECT '20250814', 'SH011', 0.42, 10.5, 2.0, 0.99, 0.09, 29
),
daily_medians AS (SELECTdistinct sdate,-- 计算中位数PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY profit_rate) OVER(PARTITION BY sdate) AS median_profit_rate,PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY avg_customer_price) OVER(PARTITION BY sdate) AS median_avg_customer_price,PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY avg_num_purchases) OVER(PARTITION BY sdate) AS median_avg_num_purchases,PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY salevalue_finishrate) OVER(PARTITION BY sdate) AS median_salevalue_finishrate,PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY abysku_rate) OVER(PARTITION BY sdate) AS median_abysku_rate,PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY dsi) OVER(PARTITION BY sdate) AS median_dsiFROM test_basedata -- GROUP BY sdate,profit_rate,avg_customer_price,avg_num_purchases,salevalue_finishrate,abysku_rate,dsi
)
SELECT * FROM daily_medians;
2.第二种做法,通用版本
解决代码:
1.首先利用窗口函数,求出总行数以及每个指标的排序
2.再利用ceiling 向上取整函数,和floor 向下取整函数 去求中位数的位置,并且返回对应的中位数数值
3.最后两数加和,取平均值
代码功能说明
这段SQL代码用于计算多个门店在不同业务指标上的中位数值。通过创建临时测试数据表test_basedata
,模拟了11家门店在利润、客单价、购买次数等指标上的数据,并采用窗口函数等方法计算各指标的中位数。
数据结构说明
临时表test_basedata
包含以下字段:
sdate
:日期标识(固定为20250814)shopid
:门店编号profit_rate
:利润率avg_customer_price
:客单价avg_num_purchases
:平均购买次数salevalue_finishrate
:销售目标完成率abysku_rate
:缺货率dsi
:库存周转天数
核心计算逻辑
中位数计算原理: 当数据量为奇数时取中间值,偶数时取中间两个值的平均数。代码通过CEILING
和FLOOR
函数分别定位中间位置的上下界。
计算过程分解:
- 内层查询使用
ROW_NUMBER()
为每个指标按值排序并标记排名 - 中间层查询通过
MAX(CASE WHEN...)
提取中间位置对应的指标值 - 外层查询对偶数的中间位置取平均值得到最终中位数
关键函数说明
ROW_NUMBER() OVER (PARTITION BY...ORDER BY...)
窗口函数,对分组后的数据按指定字段排序并生成序号CEILING(total_count/2.0)
向上取整,用于定位中位数上界位置(如10个值时返回5,11返回6(5.5向上取整为6))FLOOR(total_count/2.0)+1
向下取整后+1,用于定位中位数下界位置(如10个值时返回6,11返回6(5.5向下取整为5,5+1=6))COUNT(*) OVER (PARTITION BY sdate)
计算每个日期的总记录数,用于判断奇偶性
结果输出说明
最终输出包含:
- 原始日期字段
sdate
median_profit_rate
:利润率中位数median_avg_customer_price
:客单价中位数median_avg_num_purchases
:平均购买次数中位数median_salevalue_finishrate
:销售完成率中位数median_abysku_rate
:缺货率中位数median_dsi
:库存周转天数中位数
特殊处理说明
对于偶数个数据点(如示例中的11条记录):
- 利润率中位数 = (第6名的值 + 第6名的值)/2
- 实际会取相同值,等同于直接取第6名的值
对于奇数个数据点(如10条记录):
- 会正确计算第5名和第6名指标值的平均数
解决代码:
WITH test_basedata AS (-- 模拟10家门店的指标数据(sdate统一为20250814,覆盖不同指标值,方便验证中位数)SELECT '20250814' AS sdate, 'SH001' AS shopid, 0.25 AS profit_rate, 89.5 AS avg_customer_price, 2.1 AS avg_num_purchases, 0.85 AS salevalue_finishrate, 0.12 AS abysku_rate, 35 AS dsiUNION ALL SELECT '20250814', 'SH002', 0.32, 105.2, 1.8, 1.02, 0.08, 28UNION ALL SELECT '20250814', 'SH003', 0.18, 76.9, 2.5, 0.68, 0.15, 42UNION ALL SELECT '20250814', 'SH004', 0.41, 120.7, 1.5, 1.25, 0.05, 22UNION ALL SELECT '20250814', 'SH005', 0.29, 98.3, 2.3, 0.95, 0.10, 30UNION ALL SELECT '20250814', 'SH006', 0.35, 112.1, 1.9, 1.10, 0.07, 25UNION ALL SELECT '20250814', 'SH007', 0.22, 82.4, 2.4, 0.75, 0.13, 38UNION ALL SELECT '20250814', 'SH008', 0.38, 109.6, 1.7, 1.08, 0.06, 26UNION ALL SELECT '20250814', 'SH009', 0.27, 92.8, 2.2, 0.88, 0.11, 33UNION ALL SELECT '20250814', 'SH010', 0.31, 10.5, 2.0, 0.99, 0.09, 29UNION ALL SELECT '20250814', 'SH011', 0.42, 10.5, 2.0, 0.99, 0.09, 29
)
SELECT sdate,-- 计算中位数(偶数时取平均值)(median_profit_rate_high + median_profit_rate_low) / 2.0 AS median_profit_rate,(median_avg_customer_price_high + median_avg_customer_price_low) / 2.0 AS median_avg_customer_price,(median_avg_num_purchases_high + median_avg_num_purchases_low) / 2.0 AS median_avg_num_purchases,(median_salevalue_finishrate_high + median_salevalue_finishrate_low) / 2.0 AS median_salevalue_finishrate,(median_abysku_rate_high + median_abysku_rate_low) / 2.0 AS median_abysku_rate,(median_dsi_high + median_dsi_low) / 2.0 AS median_dsi
FROM (SELECT sdate,MAX(CASE WHEN profit_rate_rank = CEILING(total_count/2.0) THEN profit_rate END) AS median_profit_rate_high,MAX(CASE WHEN profit_rate_rank = FLOOR(total_count/2.0)+1 THEN profit_rate END) AS median_profit_rate_low,MAX(CASE WHEN avg_customer_price_rank = CEILING(total_count/2.0) THEN avg_customer_price END) AS median_avg_customer_price_high,MAX(CASE WHEN avg_customer_price_rank = FLOOR(total_count/2.0)+1 THEN avg_customer_price END) AS median_avg_customer_price_low,MAX(CASE WHEN avg_num_purchases_rank = CEILING(total_count/2.0) THEN avg_num_purchases END) AS median_avg_num_purchases_high,MAX(CASE WHEN avg_num_purchases_rank = FLOOR(total_count/2.0)+1 THEN avg_num_purchases END) AS median_avg_num_purchases_low,MAX(CASE WHEN salevalue_finishrate_rank = CEILING(total_count/2.0) THEN salevalue_finishrate END) AS median_salevalue_finishrate_high,MAX(CASE WHEN salevalue_finishrate_rank = FLOOR(total_count/2.0)+1 THEN salevalue_finishrate END) AS median_salevalue_finishrate_low,MAX(CASE WHEN abysku_rate_rank = CEILING(total_count/2.0) THEN abysku_rate END) AS median_abysku_rate_high,MAX(CASE WHEN abysku_rate_rank = FLOOR(total_count/2.0)+1 THEN abysku_rate END) AS median_abysku_rate_low,MAX(CASE WHEN dsi_rank = CEILING(total_count/2.0) THEN dsi END) AS median_dsi_high,MAX(CASE WHEN dsi_rank = FLOOR(total_count/2.0)+1 THEN dsi END) AS median_dsi_low,total_countFROM (SELECT sdate,profit_rate,avg_customer_price,avg_num_purchases,salevalue_finishrate,abysku_rate,dsi,ROW_NUMBER() OVER (PARTITION BY sdate ORDER BY profit_rate) AS profit_rate_rank,COUNT(*) OVER (PARTITION BY sdate) AS total_count,ROW_NUMBER() OVER (PARTITION BY sdate ORDER BY avg_customer_price) AS avg_customer_price_rank,ROW_NUMBER() OVER (PARTITION BY sdate ORDER BY avg_num_purchases) AS avg_num_purchases_rank,ROW_NUMBER() OVER (PARTITION BY sdate ORDER BY salevalue_finishrate) AS salevalue_finishrate_rank,ROW_NUMBER() OVER (PARTITION BY sdate ORDER BY abysku_rate) AS abysku_rate_rank,ROW_NUMBER() OVER (PARTITION BY sdate ORDER BY dsi) AS dsi_rankFROM test_basedata) ranked_dataGROUP BY sdate, total_count
) median_calculation