Weighted trendline

Tams picture Tams · Jun 18, 2012 · Viewed 20.9k times · Source

Excel produces scatter diagrams for sets of pair values. It also gives the option of producing a best fit trendline and formula for the trendline. It also produces bubble diagrams which take into consideration a weight provided with each value. However, the weight has no influence on the trendline or formula. Here is an example set of values, with their mappings and weights.

    Value Map     Weight
    0       1      10
    1       2      10
    2       5      10
    3       5      20
    4       6      20
    5       1      1

With Excel's trendline, the mapping for value 5 has too much influence on the formula. Is there any way to produce a formula that reflects the respective weights?

As a help, I've introduced a weighted average for five consecutive values. But is their a better way?

Answer

lori_m picture lori_m · Jun 19, 2012

With data in A2:C7, based on the standard weighted least squares formula, you can try:

=LINEST(B2:B7*C2:C7^0.5,IF({1,0},1,A2:A7)*C2:C7^0.5,0)

entered with CTRL+SHIFT+ENTER in E2:F2 or any 2x1 range. This also returns {1.1353,1.4412}.

For Rsquared you can enter:

=INDEX(LINEST((B2:B7-SUM(B2:B7*C2:C7)/SUM(C2:C7))*C2:C7^0.5,IF({1,0},1,A2:A7)*C2:C7^0.5,0,1),3,1)

Explanation of formula

Consider first a normal regression of y on X using LINEST. If const = TRUE the regressor matrix is the augmented matrix consisting of a column of ones followed by the regressor columns i.e. X'=(1,X). If const = FALSE the regressor matrix is simply X so running the regression with a column of ones included gives the same estimates as running without a column of ones and setting const=TRUE.

Now consider a weighted least squares regression. The regression is now Wy on WX'=(W1,WX) where W is the diagonal matrix consisting of the square root of the weights. Since there is not a column of ones present, we must set const = FALSE and use two columns in the regressor matrix.

Rsquared Calculation

Setting stats to TRUE in the LINEST output of the first formula we get in the third and fifth rows:

SSres = 59.76
SSreg(u) = 1461.24
SSTot(u) = 1521
Rsq(u) = 1 - 59.76/1521 = 0.9607 

Note these values are uncentered versions (u) since the const=FALSE (refer to MS Help on LINEST for further info.) For the centered versions (c) we need to subtract the weighted average as below:

SSTot(c) =SUMPRODUCT(C2:C7*(B2:B7-SUM(B2:B7*C2:C7)/SUM(C2:C7))^2) = 244.93
Rsq(c) = 1 - 59.76/244.93 = 0.756