按类型组合连续日期时间间隔

时间:2022-10-21
本文介绍了按类型组合连续日期时间间隔的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着跟版网的小编来一起学习吧!

问题描述

假设我们有这样一个表:

声明@periods 表(s 日期,日期,ttinyint);

具有按开始日期排序的无间隙日期间隔

 插入@periods 值('2013-01-01', '2013-01-02', 3),('2013-01-02', '2013-01-04', 1),('2013-01-04', '2013-01-05', 1),('2013-01-05', '2013-01-06', 2),('2013-01-06', '2013-01-07', 2),('2013-01-07', '2013-01-08', 2),('2013-01-08', '2013-01-09', 1);

所有日期间隔都有不同的类型 (t).

需要组合相同类型的日期间隔,并且不会被其他类型的间隔打破(所有间隔按开始日期排序).

所以结果表应该是这样的:

 s |电子 |吨------------|------------|-----2013-01-01 |2013-01-02 |32013-01-02 |2013-01-05 |12013-01-05 |2013-01-08 |22013-01-08 |2013-01-09 |1

任何想法如何在没有光标的情况下做到这一点?

<小时>

我有一个可行的解决方案:

声明@periods 表(s datetime 主键聚集,日期时间,ttinyint,period_number int);插入@periods (s, e, t) 值('2013-01-01', '2013-01-02', 3),('2013-01-02', '2013-01-04', 1),('2013-01-04', '2013-01-05', 1),('2013-01-05', '2013-01-06', 2),('2013-01-06', '2013-01-07', 2),('2013-01-07', '2013-01-08', 2),('2013-01-08', '2013-01-09', 1);声明@t tinyint = null;声明@PeriodNumber int = 0;声明@anchor 日期;更新@periods设置 period_number = @PeriodNumber,@PeriodNumber = 案例当@t <>吨然后@PeriodNumber + 1别的@PeriodNumber结尾,@t = t,@锚= s选项(maxdop 1);选择s = 分钟(s),e = max(e),t = min(t)从@句号通过...分组period_number订购s;

但我怀疑我是否可以依赖 UPDATE 语句的这种行为?

我使用 SQL Server 2008 R2.

<小时>

感谢 Daniel 和这篇文章:http://www.sqlservercentral.com/文章/T-SQL/68467/

我发现上面的解决方案中遗漏了三件重要的事情:

  1. 表上必须有聚集索引
  2. 必须有锚变量和聚集列的调用
  3. 更新语句应该由一个处理器执行,即没有并行性

我已根据这些规则更改了上述解决方案.

解决方案

由于您的范围是连续的,因此问题本质上变成了 间隙和岛屿 一.如果您有一个标准来帮助您区分具有相同 t 值的不同序列,您可以使用该标准对所有行进行分组,然后只需取 MIN(s), MAX(e) 每个组.

获得此类标准的一种方法是使用两个 ROW_NUMBER 调用.考虑以下查询:

SELECT*,rnk1 = ROW_NUMBER() OVER(ORDER BY s),rnk2 = ROW_NUMBER() OVER(PARTITION BY t ORDER BY s)发件人@句号;

对于您的示例,它将返回以下集合:

s e t rnk1 rnk2---------- ---------- -- -- ---- ----2013-01-01 2013-01-02 3 1 12013-01-02 2013-01-04 1 2 12013-01-04 2013-01-05 1 3 22013-01-05 2013-01-06 2 4 12013-01-06 2013-01-07 2 5 22013-01-07 2013-01-08 2 6 32013-01-08 2013-01-09 1 7 3

rnk1rnk2 排名的有趣之处在于,如果您从另一个中减去一个,您将获得与 t,唯一标识每个具有相同t的不同行序列:

s e t rnk1 rnk2 rnk1 - rnk2---------- ---------- -- ---- ---- ------------2013-01-01 2013-01-02 3 1 1 02013-01-02 2013-01-04 1 2 1 12013-01-04 2013-01-05 1 3 2 12013-01-05 2013-01-06 2 4 1 32013-01-06 2013-01-07 2 5 2 32013-01-07 2013-01-08 2 6 3 32013-01-08 2013-01-09 1 7 3 4

知道了这一点,您可以轻松地应用分组和聚合.这是最终查询的样子:

WITH 分区 AS (选择*,g = ROW_NUMBER() OVER ( ORDER BY s)- ROW_NUMBER() OVER (PARTITION BY t ORDER BY s)发件人@句号)选择s = MIN(s),e = MAX(e),吨从分区通过...分组吨,G;

如果您愿意,可以在 在 SQL Fiddle 使用此解决方案.>

Say we have such a table:

declare @periods table (
    s date, 
    e date,
    t tinyint
);

with date intervals without gaps ordered by start date (s)

insert into @periods values
('2013-01-01' , '2013-01-02', 3),
('2013-01-02' , '2013-01-04', 1),
('2013-01-04' , '2013-01-05', 1),
('2013-01-05' , '2013-01-06', 2),
('2013-01-06' , '2013-01-07', 2),
('2013-01-07' , '2013-01-08', 2),
('2013-01-08' , '2013-01-09', 1);

All date intervals have different types (t).

It is required to combine date intervals of the same type where they are not broken by intervals of the other types (having all intervals ordered by start date).

So the result table should look like:

      s     |      e     |  t
------------|------------|-----
 2013-01-01 | 2013-01-02 |  3
 2013-01-02 | 2013-01-05 |  1
 2013-01-05 | 2013-01-08 |  2
 2013-01-08 | 2013-01-09 |  1

Any ideas how to do this without cursor?


I've got one working solution:

declare @periods table (
    s datetime primary key clustered, 
    e datetime,
    t tinyint,
    period_number int   
);

insert into @periods (s, e, t) values
('2013-01-01' , '2013-01-02', 3),
('2013-01-02' , '2013-01-04', 1),
('2013-01-04' , '2013-01-05', 1),
('2013-01-05' , '2013-01-06', 2),
('2013-01-06' , '2013-01-07', 2),
('2013-01-07' , '2013-01-08', 2),
('2013-01-08' , '2013-01-09', 1);

declare @t tinyint = null;  
declare @PeriodNumber int = 0;
declare @anchor date;

update @periods
    set  period_number = @PeriodNumber, 
    @PeriodNumber = case
                        when @t <> t
                            then  @PeriodNumber + 1
                        else
                            @PeriodNumber
                    end,
    @t = t,
    @anchor = s
option (maxdop 1);

select 
    s = min(s),
    e = max(e),
    t = min(t)
from 
    @periods    
group by 
    period_number
order by 
    s;

but I doubt if I can rely on such a behavior of UPDATE statement?

I use SQL Server 2008 R2.


Edit:

Thanks to Daniel and this article: http://www.sqlservercentral.com/articles/T-SQL/68467/

I found three important things that were missed in the solution above:

  1. There must be clustered index on the table
  2. There must be anchor variable and call of the clustered column
  3. Update statement should be executed by one processor, i.e. without parallelism

I've changed the above solution in accordance with these rules.

解决方案

Since your ranges are continuous, the problem essentially becomes a gaps-and-islands one. If only you had a criterion to help you to distinguish between different sequences with the same t value, you could group all the rows using that criterion, then just take MIN(s), MAX(e) for every group.

One method of obtaining such a criterion is to use two ROW_NUMBER calls. Consider the following query:

SELECT
  *,
  rnk1 = ROW_NUMBER() OVER (               ORDER BY s),
  rnk2 = ROW_NUMBER() OVER (PARTITION BY t ORDER BY s)
FROM @periods
;

For your example it would return the following set:

s           e           t   rnk1  rnk2
----------  ----------  --  ----  ----
2013-01-01  2013-01-02  3   1     1
2013-01-02  2013-01-04  1   2     1
2013-01-04  2013-01-05  1   3     2
2013-01-05  2013-01-06  2   4     1
2013-01-06  2013-01-07  2   5     2
2013-01-07  2013-01-08  2   6     3
2013-01-08  2013-01-09  1   7     3

The interesting thing about the rnk1 and rnk2 rankings is that if you subtract one from the other, you will get values that, together with t, uniquely identify every distinct sequence of rows with the same t:

s           e           t   rnk1  rnk2  rnk1 - rnk2
----------  ----------  --  ----  ----  -----------
2013-01-01  2013-01-02  3   1     1     0
2013-01-02  2013-01-04  1   2     1     1
2013-01-04  2013-01-05  1   3     2     1
2013-01-05  2013-01-06  2   4     1     3
2013-01-06  2013-01-07  2   5     2     3
2013-01-07  2013-01-08  2   6     3     3
2013-01-08  2013-01-09  1   7     3     4

Knowing that, you can easily apply grouping and aggregation. This is what the final query might look like:

WITH partitioned AS (
  SELECT
    *,
    g = ROW_NUMBER() OVER (               ORDER BY s)
      - ROW_NUMBER() OVER (PARTITION BY t ORDER BY s)
  FROM @periods
)
SELECT
  s = MIN(s),
  e = MAX(e),
  t
FROM partitioned
GROUP BY
  t,
  g
;

If you like, you can play with this solution at SQL Fiddle.

这篇关于按类型组合连续日期时间间隔的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持跟版网!

上一条:如何将存储过程中的多个输出抓取到临时表中 下一条:T-SQL 算术溢出:哪一列?

相关文章

最新文章