I have a bunch of data, generally in the form a, b, c, ..., y
where y = f(a, b, c...)
Most of them are three and four variables, and have 10k - 10M records. My general assumption is that they are algebraic in nature, something like:
y = P1 a^E1 + P2 b^E2 + P3 c^E3
Unfortunately, my last statistical analysis class was 20 years ago. What is the easiest way to get a good approximation of f? Open source tools with a very minimal learning curve (i.e. something where I could get a decent approximation in an hour or so) would be ideal. Thanks!
In case it's useful, here's a Numpy/Scipy (Python) template to do what you want:
from numpy import array
from scipy.optimize import leastsq
def __residual(params, y, a, b, c):
p0, e0, p1, e1, p2, e2 = params
return p0 * a ** e0 + p1 * b ** e1 + p2 * c ** e2 - y
# load a, b, c
# guess initial values for p0, e0, p1, e1, p2, e2
p_opt = leastsq(__residual, array([p0, e0, p1, e1, p2, e2]), args=(y, a, b, c))
print 'y = %f a^%f + %f b^%f %f c^%f' % map(float, p_opt)
If you really want to understand what's going on, though, you're going to have to invest the time to scale the learning curve for some tool or programming environment - I really don't think there's any way around that. People don't generally write specialized tools for doing things like 3-term power regressions exclusively.