There's a lot of different posts out there on solving the "gaps and islands" problem, but I recently struck a slight variation that caused quite a bit of thinking, searching and more thinking.
The problem I had related to creating a unique timeline view of data from temporal, history, tables, but in this case the values I needed to display would revert back to previous values, and I needed to retain this sequence. Here's a simple example:
You can see the values for the Manager change from Sam to Mike then back to Sam for Bobs Bikes, and from Mandy to Sam and back to Mandy again for Mandy's Mowers. The output I needed was:
The solutions I initially thought of, and further searches, would group the managers together and loose the sequencing, then I stumbled across a post (which I can't re-find) which, while not exactly what I wanted, set me on the right track. Here's what I came up with:
First up was to union together the history table and current/live table data together into a temp table and add 3 extra columns to the temp table;- a column to hold a hash of the row data I wanted to compare. This was across multiple columns so using a hash made this easier than trying to compare each column multiple times
- a numeric column to hold a unique identity id for each row
- sequence start column, which I'll explain shortly. I also added a sequence end but ended up not using that in the final solution
with cte as(
select *,
hashbytes('sha2_256', concat(CompanyName, '|', Manager)) as sha2,
row_number() over(order by CompanyName,validfrom) as rm
from #t
)
update cte set rn = rm, sha = sha2;
with cte as(
select t1.* , case when t1.sha <> t2.sha or t2.sha is null then 1 else 0 end as is_start
from #t t1
left join #t t2 on t1.CompanyName = t2.CompanyName and t1.rn = t2.rn + 1
)
update cte set seq_start = is_start;
select *, sum(seq_start) over(partition by CompanyName order by ValidFrom rows unbounded preceding) as grp
from #t
rows between current row and unbounded following
part to work correctlywith cte as(
select *, sum(seq_start) over(partition by CompanyName order by ValidFrom rows unbounded preceding) as grp
from #t
)
select distinct CompanyName, Manager,
FIRST_VALUE(ValidFrom) over(partition by CompanyName, grp order by ValidFrom) as ValidFrom,
LAST_VALUE(ValidTo) over(partition by CompanyName, grp order by ValidTo rows between current row and unbounded following) as ValidTo
from cte
order by CompanyName, ValidFrom