R-squared value in Excel with the 'addtrendline' function?

Lucile picture Lucile · Jun 30, 2014 · Viewed 18k times · Source

I have been struggling a while on that; In short, I can't find the equations Excel uses for R2.

  • Here are my data :
    x: 1 2 3 4 5 6 7 8 9 10
    y: 4 9 1 2 1 1 8 5 5 1

  • I plot the data, fit a power law function ('add trendline') and use the 'add trendline > options > Display R-squared value on chart'

Value displayed :
R2 = 0.03008.

Problem 1

If I calculate it in excel using 'RSQ()' function (taking the values of the parameters Excel has found for the fitting function), or by hand using the definition (wikipedia)...
R2 = 0.0272

Problem 2

In Matlab, using the 'fit' function, the parameters for the fitting function (and then of course the R2) are not the ones that EXCEL has found.

Questions:

So here is my main question :
How does Excel computes the R2 in the 'add trendline' function, as it is obviously not the one from the definition (wikipedia)?

and the bonus question:
Why Excel and Matlab don't end up with the same fitting parameters?

thanks a lot!

%%%%%% EDIT BELOW! %%%%
As an answer to a comment; Here is the Matlab code I use:

%% R-squared with the fit function
% use the fit function in Matlab, yobs being the data
[param, results] = fit(x,yobs,'power1');   

% R-squared from the fit function :
r_sq_from_fit = results.rsquare;

%% here I calculate "by hand" the R-squared, from the general definition (wikipedia!)
% calculates the fitting data yfit
yfit = (p_powerlaw.a).*x.^p_powerlaw.b;

% mean of the yobs, total sum of squares, and residual sum of squares
yobs_mean = mean(yobs);
SStot = sum((yobs-yobs_mean).^2);
SSres = sum((yobs-yfit).^2);

r_sq_hand = 1-SSres/SStot;

I find the same values wether I get R-squared from the fit function in Matlab or I calculate it "by hand". Matlab seems to be consistent and apparently uses the strict definition of R-squared in its function...

However; when I compare:

  • the R-square value given by Excel from RSQ() function
  • and the value I obtain by hand calculating R-square from the definition (taking of course the yfit values that Excel returned me, not the one Matlab returns, as Excel and Matlab don't agree on the fitting parameters!)

...I obtain different values! Excel : 0.027, as I said before, and hand calculated : -0.1109 (!)

Answer

David Sustr picture David Sustr · Jul 5, 2015

This question (how does the "add trendline" in Excel really works?) also puzzled me for a longer time, because in a research I need to be sure about an origin of my numbers. Because I havent found too much about this on internet, so I tryed several vays of manual R^2 (coefficient of determination) evaluation in order to obtain the same results as Excel.

I made a same observation like it was mentioned in question. When one uses "add trendline" for linear (and also logaritmic) function fitting, the resulting R^2 and regresion parameters are identical to manually calculated parameters. But when one uses "add trendline" for other nonlinear function fitting (for example exponential), the resulting R^2 and regresion parameters differs from manually calculated parameters.

The solution of this problem was already partially mentioned in discussion here. It seems that in order to fit nonlinear trend into provided data, Excel primary linearize the problem. So for example to fit exponential function y=a* exp(b * x) it primarily transforms it into function ln(y)=ln(a)+b*x. Then the relation between ln(y) and x is linear. Afterwards the linearized function is fitted into transformed data using usual strategy of minimization of sum of squared residuals. Thus the regresion parameters ln(a) and b are obtained. Also R^2 is calculated from linearized form. Because it is linear dependency, RSQ() function may be used by Excel for calculation of R^2.

When one follows this procedure manually, then the resulting regresion parameters and R^2 values are identical with those provided by Excel's "add trendline".

So generally the regresion parameters and R^2 values provided by Excel's "add trendline" in case of non-linear regresion are not true nonlinear but most probably obtained after linearization of problem. As a consequence, these parameters slightly differ from parameters calculated directly without any transformations.

Note about R^2: As far as I understand now, the R^2 for linear case (better denoted by a small letter: r^2) is calculated as a square of corelation coefficient. (RSQ()=CORREL()^2=PEARSON()^2). Because of this relation the values of r^2 can range only between 0 - 1. One set of input knowns can be altered by intercept and/or slope without change of r^2 value. On the other hand the R^2 for a nonlinear case (better denoted by capital leter) is defined a different way (see Wikipedia). It's values are not bottom limited to value 0 but maximum value 1 still indicate the best fit. Alteration of one set of input knowns by intercept and/or slope changes the R^2 value. R^2 works same well also in linear case.