LAG() 和 LEAD() 的用法:
这两个函数用于访问结果集中当前行之前或之后的行数据,而无需使用自连接。
LAG(column, offset, default_value) OVER (
[PARTITION BY partition_expression]
ORDER BY order_expression
)
LEAD(column, offset, default_value) OVER (
[PARTITION BY partition_expression]
ORDER BY order_expression
)
| 参数 | 说明 |
|---|---|
column |
要访问的列名 |
offset |
偏移量(默认1,可选) |
default_value |
当没有前/后行时的默认值(可选) |
PARTITION BY |
分区子句,在每个分区内独立计算 |
ORDER BY |
排序子句,决定行的前后顺序 |
CREATE TABLE sales (
sale_date DATE,
product VARCHAR(50),
amount DECIMAL(10,2)
);
INSERT INTO sales VALUES
('2024-01-01', 'A', 100),
('2024-01-02', 'A', 150),
('2024-01-03', 'A', 120),
('2024-01-01', 'B', 200),
('2024-01-02', 'B', 180),
('2024-01-03', 'B', 220);
-- 获取前一天的销售额
SELECT
sale_date,
product,
amount,
LAG(amount) OVER (PARTITION BY product ORDER BY sale_date) as prev_amount,
-- 计算日环比增长率
ROUND((amount - LAG(amount) OVER (PARTITION BY product ORDER BY sale_date))
/ LAG(amount) OVER (PARTITION BY product ORDER BY sale_date) * 100, 2) as growth_rate
FROM sales
ORDER BY product, sale_date;
-- 获取前两天(偏移量为2)的数据
SELECT
sale_date,
product,
amount,
LAG(amount, 2) OVER (PARTITION BY product ORDER BY sale_date) as amount_2days_ago,
LEAD(amount, 2, 0) OVER (PARTITION BY product ORDER BY sale_date) as amount_2days_later
FROM sales;
SELECT
sale_date,
product,
amount,
-- 当没有前一天数据时,使用0作为默认值
LAG(amount, 1, 0) OVER (PARTITION BY product ORDER BY sale_date) as prev_amount,
-- 当没有后一天数据时,使用NULL作为默认值
LEAD(amount, 1, NULL) OVER (PARTITION BY product ORDER BY sale_date) as next_amount
FROM sales;
-- 计算月度环比
WITH monthly_sales AS (
SELECT
DATE_TRUNC('month', sale_date) as month,
SUM(amount) as total_amount
FROM sales
GROUP BY DATE_TRUNC('month', sale_date)
)
SELECT
month,
total_amount,
LAG(total_amount) OVER (ORDER BY month) as prev_month_amount,
ROUND((total_amount - LAG(total_amount) OVER (ORDER BY month))
/ LAG(total_amount) OVER (ORDER BY month) * 100, 2) as mom_growth
FROM monthly_sales;
-- 查找销售额连续下降的产品
WITH sales_trend AS (
SELECT
sale_date,
product,
amount,
LAG(amount) OVER (PARTITION BY product ORDER BY sale_date) as prev_amount
FROM sales
)
SELECT *
FROM sales_trend
WHERE amount < prev_amount; -- 当前销售额小于前一天
-- 计算用户连续登录的时间间隔
SELECT
user_id,
login_date,
LAG(login_date) OVER (PARTITION BY user_id ORDER BY login_date) as last_login,
DATEDIFF(day,
LAG(login_date) OVER (PARTITION BY user_id ORDER BY login_date),
login_date) as days_between_logins
FROM user_logins;
-- 向前填充缺失值
SELECT
date,
COALESCE(value,
LAG(value) IGNORE NULLS OVER (ORDER BY date)) as filled_value
FROM data_with_nulls;
| 函数 | 方向 | 是否必须排序 | 典型用途 |
|---|---|---|---|
| LAG() | 向前看(之前) | 是 | 访问前一行的值 |
| LEAD() | 向后看(之后) | 是 | 访问后一行的值 |
| FIRST_VALUE() | 分区第一行 | 可选 | 获取分区第一个值 |
| LAST_VALUE() | 分区最后一行 | 可选 | 获取分区最后一个值 |
PARTITION BY 和 ORDER BY 的列有索引
避免多层嵌套:尽量减少窗口函数的嵌套层级
使用CTE或子查询:复杂计算分步骤进行
限制窗口范围:使用 ROWS BETWEEN 子句限制计算范围
COALESCE() 或默认值参数
分区边界:LAG() 和 LEAD() 不会跨越分区边界
性能影响:大量数据时,窗口函数可能有性能开销
-- 计算3天的移动平均
SELECT
sale_date,
product,
amount,
AVG(amount) OVER (
PARTITION BY product
ORDER BY sale_date
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
) as moving_avg_3days,
-- 使用LAG和LEAD手动计算
(LAG(amount, 2, 0) OVER (PARTITION BY product ORDER BY sale_date) +
LAG(amount, 1, 0) OVER (PARTITION BY product ORDER BY sale_date) +
amount) / 3.0 as manual_moving_avg
FROM sales;
这两个函数是时间序列分析、趋势计算和数据对比的强大工具,掌握它们能大大简化许多复杂的SQL查询。