3 Month Moving Average - Redshift SQL

user2427023 picture user2427023 · Mar 20, 2016 · Viewed 11.7k times · Source

I am trying to create a 3 Month Moving Average based on some data that I have while using RedShift SQL or Domo BeastMode (if anyone is familiar with that).

The data is on a day to day basis, but needs to be displayed by month. So the quotes/revenue need to be summarized by month, and then a 3MMA needs to be calculated (excluding the current month).

So, if the quote was in April, I would need the average of Jan, Feb, Mar.

The input data looks like this:

Quote Date MM/DD/YYYY     Revenue
3/24/2015                 61214
8/4/2015                  22983
9/3/2015                  30000
9/15/2015                 171300
9/30/2015                 112000

And I need the output to look something like this:

Month               Revenue             3MMA
Jan 2015            =Sum of Jan Rev     =(Oct14 + Nov14 + Dec14) / 3
Feb 2015            =Sum of Feb Rev     =(Nov14 + Dec14 + Jan15) / 3
March 2015          =Sum of Mar Rev     =(Dec14 + Jan15 + Feb15) / 3
April 2015          =Sum of Apr Rev     =(Jan15 + Feb15 + Mar15) / 3
May 2015            =Sum of May Rev     =(Feb15 + Mar15 + Apr15) / 3

If anyone is able to help, I would be extremely grateful! I have been stuck on this for quite a while and have no idea what I'm doing when it comes to SQL lol.

Cheers, Logan.

Answer

Gordon Linoff picture Gordon Linoff · Mar 20, 2016

You can do this using aggregation and window functions:

select date_trunc('month', quotedate) as mon,
       sum(revenue) as mon_revenue,
       avg(sum(revenue)) over (order by date_trunc('month', quotedate)  rows between 2 preceding and current row) as revenue_3mon
from t
group by date_trunc('month', quotedate) 
order by mon;

Note: this uses average, so for the first and second row, it will divide by 1 and 2 respectively. It also assumes that you have at least one record for each month.

EDIT:

I wonder if there is an issue with aggregation functions mixed with analytic functions in RedShift. Is the following any better:

select m.*,
       avg(mon_revenue) over (order by mon rows between 2 preceding and current row) as revenue_3mon
from (select date_trunc('month', quotedate) as mon,
             sum(revenue) as mon_revenue
      from t
      group by date_trunc('month', quotedate) 
     ) m
order by mon;