Does anyone know of an efficient way to do multiple linear regression in C#, where the number of simultaneous equations may be in the 1000's (with 3 or 4 different inputs). After reading this article on multiple linear regression I tried implementing it with a matrix equation:
Matrix y = new Matrix(
new double[,]{{745},
{895},
{442},
{440},
{1598}});
Matrix x = new Matrix(
new double[,]{{1, 36, 66},
{1, 37, 68},
{1, 47, 64},
{1, 32, 53},
{1, 1, 101}});
Matrix b = (x.Transpose() * x).Inverse() * x.Transpose() * y;
for (int i = 0; i < b.Rows; i++)
{
Trace.WriteLine("INFO: " + b[i, 0].ToDouble());
}
However it does not scale well to the scale of 1000's of equations due to the matrix inversion operation. I can call the R language and use that, however I was hoping there would be a pure .Net solution which will scale to these large sets.
Any suggestions?
EDIT #1:
I have settled using R for the time being. By using statconn (downloaded here) I have found it to be both fast & relatively easy to use this method. I.e. here is a small code snippet, it really isn't much code at all to use the R statconn library (note: this is not all the code!).
_StatConn.EvaluateNoReturn(string.Format("output <- lm({0})", equation));
object intercept = _StatConn.Evaluate("coefficients(output)['(Intercept)']");
parameters[0] = (double)intercept;
for (int i = 0; i < xColCount; i++)
{
object parameter = _StatConn.Evaluate(string.Format("coefficients(output)['x{0}']", i));
parameters[i + 1] = (double)parameter;
}
For the record, I recently found the ALGLIB library which, whilst not having much documentation, has some very useful functions such as the linear regression which is one of the things I was after.
Sample code (this is old and unverified, just a basic example of how I was using it). I was using the linear regression on time series with 3 entries (called 3min/2min/1min) and then the finishing value (Final).
public void Foo(List<Sample> samples)
{
int nAttributes = 3; // 3min, 2min, 1min
int nSamples = samples.Count;
double[,] tsData = new double[nSamples, nAttributes];
double[] resultData = new double[nSamples];
for (int i = 0; i < samples.Count; i++)
{
tsData[i, 0] = samples[i].Tminus1min;
tsData[i, 1] = samples[i].Tminus2min;
tsData[i, 2] = samples[i].Tminus3min;
resultData[i] = samples[i].Final;
}
double[] weights = null;
int fitResult = 0;
alglib.lsfit.lsfitreport rep = new alglib.lsfit.lsfitreport();
alglib.lsfit.lsfitlinear(resultData, tsData, nSamples, nAttributes, ref fitResult, ref weights, rep);
Dictionary<string, double> labelsAndWeights = new Dictionary<string, double>();
labelsAndWeights.Add("1min", weights[0]);
labelsAndWeights.Add("2min", weights[1]);
labelsAndWeights.Add("3min", weights[2]);
}