Sorry for the long post, but I have provided copy & paste sample data and a possible solution approach below. The relevant part of the question is in the upper part of the post (above the horizontal rule).
I have the following table
Dt customer_id buy_time money_spent
-------------------------------------------------
2000-01-04 100 11:00:00.00 2
2000-01-05 100 16:00:00.00 1
2000-01-10 100 13:00:00.00 4
2000-01-10 100 14:00:00.00 3
2000-01-04 200 09:00:00.00 10
2000-01-06 200 10:00:00.00 11
2000-01-06 200 11:00:00.00 5
2000-01-10 200 08:00:00.00 20
and want a query to get this result set
Dt Dt_next customer_id buy_time money_spent
-------------------------------------------------------------
2000-01-04 2000-01-05 100 11:00:00.00 2
2000-01-05 2000-01-10 100 16:00:00.00 1
2000-01-10 NULL 100 13:00:00.00 4
2000-01-10 NULL 100 14:00:00.00 3
2000-01-04 2000-01-06 200 09:00:00.00 10
2000-01-06 2000-01-10 200 10:00:00.00 11
2000-01-06 2000-01-10 200 11:00:00.00 5
2000-01-10 NULL 200 08:00:00.00 20
That is: I want for each costumer (customer_id
) and each day (Dt
) the next day the same customer has visited (Dt_next
).
I have already one query that gives the latter result set (data and query enclosed below the horizontal rule). However, it involves a left outer join
and two dense_rank
aggregate functions. This approach seems a bit clumsy to me and I think that there should be a better solution. Any pointers to alternative solutions highly appreciated! Thank you!
BTW: I am using SQL Server 11 and the table has >>1m entries.
My query:
select
customer_table.Dt
,customer_table_lead.Dt as Dt_next
,customer_table.customer_id
,customer_table.buy_time
,customer_table.money_spent
from
(
select
#customer_data.*
,dense_rank() over (partition by customer_id order by customer_id asc, Dt asc) as Dt_int
from #customer_data
) as customer_table
left outer join
(
select distinct
#customer_data.Dt
,#customer_data.customer_id
,dense_rank() over (partition by customer_id order by customer_id asc, Dt asc)-1 as Dt_int
from #customer_data
) as customer_table_lead
on
(
customer_table.Dt_int=customer_table_lead.Dt_int
and customer_table.customer_id=customer_table_lead.customer_id
)
Sample data:
create table #customer_data (
Dt date not null,
customer_id int not null,
buy_time time(2) not null,
money_spent float not null
);
insert into #customer_data values ('2000-01-04',100,'11:00:00',2);
insert into #customer_data values ('2000-01-05',100,'16:00:00',1);
insert into #customer_data values ('2000-01-10',100,'13:00:00',4);
insert into #customer_data values ('2000-01-10',100,'14:00:00',3);
insert into #customer_data values ('2000-01-04',200,'09:00:00',10);
insert into #customer_data values ('2000-01-06',200,'10:00:00',11);
insert into #customer_data values ('2000-01-06',200,'11:00:00',5);
insert into #customer_data values ('2000-01-10',200,'08:00:00',20);
Try this query:
select cd.Dt
, t.Dt_next
, cd.customer_id
, cd.buy_time
, cd.money_spent
from (
select Dt
, LEAD(Dt) OVER (PARTITION BY customer_id ORDER BY Dt) AS Dt_next
, customer_id
from (
select distinct Dt, customer_id
from #customer_data
) t
) t
inner join #customer_data cd on t.customer_id = cd.customer_id and t.Dt = cd.Dt
Why field money_spent
has float type? You may have problems with calculations. Convert it to decimal type.