lag to get first non null value since the previous null value

Francesco Rinaldi picture Francesco Rinaldi · Nov 21, 2017 · Viewed 8.4k times · Source

Below is an example of what I'm trying to achieve in a Redshift Database.

I have a variable current_value and I want to create a new column value_desired that is:

  • the same as current_value if the previous row is null
  • equal to the last preceding non-null value if the previous row is non-null

It sounds like an easy task but I haven't found a way to do it yet.

row_numb     current_value   value_desired
1
2
3            47              47
4
5            45              45
6
7
8            42              42
9            41              42
10           40              42
11           39              42
12           38              42
13
14           36              36
15
16
17           33              33
18           32              33

I've tried with the LAG() function but I can only get the previous value (not the first in the "non-null" block), here is my take:

SELECT *
    , CASE WHEN current_value is not null and LAG(current_value) is null THEN current_value
           WHEN current_value is not null and LAG(current_value) is not null 
            THEN LAG(current_value)
      ELSE NULL END AS value_desired
  FROM test1

Any help is much appreciated, thanks.

Answer

Jon Scott picture Jon Scott · Nov 21, 2017

Here is the correct answer, which gives the right results. There are a few clever tricks here, i suggest you take a careful look through and let me know what needs clarifications

create test data, as per your question.

drop table if exists test_table ;
create table test_table (row_num int,current_value int);
insert into test_table(row_num, current_value)
values
  (1,null),
(2,null),
(3,47),
(4,null),
(5,45),
(6,null),
(7,null),
(8 ,42),
(9 ,41),
(10,40  ),
(11,39 ),
(12,38 ),
(13,null),
(14,36),
(15,null),
(16,null),
(17 ,33),
(18,32  )
;

Then run this code

SELECT DISTINCT
  j1.row_num,
  CASE WHEN j1.current_value IS NULL
    THEN NULL
  ELSE
    last_value(j2.current_value)
    OVER (
      PARTITION BY j1.row_num
      ORDER BY j2.row_num
      ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING ) END AS value_desired
FROM test_table AS j1
  LEFT JOIN (SELECT
               row_num,
               current_value,
               lag(current_value, 1)
               OVER (
                 ORDER BY row_num ) AS prev_cval
             FROM test_table) AS j2
    ON j1.row_num >= j2.row_num AND j2.current_value IS NOT NULL
       AND j2.prev_cval IS NULL
ORDER BY j1.row_num;