Simple multidimensional curve fitting

user64258 picture user64258 · Feb 9, 2009 · Viewed 22.1k times · Source

I have a bunch of data, generally in the form a, b, c, ..., y

where y = f(a, b, c...)

Most of them are three and four variables, and have 10k - 10M records. My general assumption is that they are algebraic in nature, something like:

y = P1 a^E1 + P2 b^E2 + P3 c^E3

Unfortunately, my last statistical analysis class was 20 years ago. What is the easiest way to get a good approximation of f? Open source tools with a very minimal learning curve (i.e. something where I could get a decent approximation in an hour or so) would be ideal. Thanks!

Answer

David Z picture David Z · Feb 9, 2009

In case it's useful, here's a Numpy/Scipy (Python) template to do what you want:

from numpy import array
from scipy.optimize import leastsq

def __residual(params, y, a, b, c):
    p0, e0, p1, e1, p2, e2 = params
    return p0 * a ** e0 + p1 * b ** e1 + p2 * c ** e2 - y

# load a, b, c
# guess initial values for p0, e0, p1, e1, p2, e2
p_opt = leastsq(__residual,  array([p0, e0, p1, e1, p2, e2]), args=(y, a, b, c))
print 'y = %f a^%f + %f b^%f %f c^%f' % map(float, p_opt)

If you really want to understand what's going on, though, you're going to have to invest the time to scale the learning curve for some tool or programming environment - I really don't think there's any way around that. People don't generally write specialized tools for doing things like 3-term power regressions exclusively.