matrices are not aligned Error: Python SciPy fmin_bfgs

SaB picture SaB · Jan 6, 2012 · Viewed 8.6k times · Source

Problem Synopsis: When attempting to use the scipy.optimize.fmin_bfgs minimization (optimization) function, the function throws a

derphi0 = np.dot(gfk, pk) ValueError: matrices are not aligned

error. According to my error checking this occurs at the very end of the first iteration through fmin_bfgs--just before any values are returned or any calls to callback.

Configuration: Windows Vista Python 3.2.2 SciPy 0.10 IDE = Eclipse with PyDev

Detailed Description: I am using the scipy.optimize.fmin_bfgs to minimize the cost of a simple logistic regression implementation (converting from Octave to Python/SciPy). Basically, the cost function is named cost_arr function and the gradient descent is in gradient_descent_arr function.

I have manually tested and fully verified that *cost_arr* and *gradient_descent_arr* work properly and return all values properly. I also tested to verify that the proper parameters are passed to the *fmin_bfgs* function. Nevertheless, when run, I get the ValueError: matrices are not aligned. According to the source review, the exact error occurs in the

def line_search_wolfe1 function in # Minpack's Wolfe line and scalar searches as supplied by the scipy packages.

Notably, if I use scipy.optimize.fmin instead, the fmin function runs to completion.

Exact Error:

File "D:\Users\Shannon\Programming\Eclipse\workspace\SBML\sbml\LogisticRegression.py", line 395, in fminunc_opt

optcost = scipy.optimize.fmin_bfgs(self.cost_arr, initialtheta, fprime=self.gradient_descent_arr, args=myargs, maxiter=maxnumit, callback=self.callback_fmin_bfgs, retall=True)   

File "C:\Python32x32\lib\site-packages\scipy\optimize\optimize.py", line 533, in fmin_bfgs old_fval,old_old_fval)
File "C:\Python32x32\lib\site-packages\scipy\optimize\linesearch.py", line 76, in line_search_wolfe1 derphi0 = np.dot(gfk, pk) ValueError: matrices are not aligned

I call the optimization function with: optcost = scipy.optimize.fmin_bfgs(self.cost_arr, initialtheta, fprime=self.gradient_descent_arr, args=myargs, maxiter=maxnumit, callback=self.callback_fmin_bfgs, retall=True)

I have spent a few days trying to fix this and cannot seem to determine what is causing the matrices are not aligned error.

ADDENDUM: 2012-01-08 I worked with this a lot more and seem to have narrowed the issues (but am baffled on how to fix them). First, fmin (using just fmin) works using these functions--cost, gradient. Second, the cost and the gradient functions both accurately return expected values when tested in a single iteration in a manual implementation (NOT using fmin_bfgs). Third, I added error code to optimize.linsearch and the error seems to be thrown at def line_search_wolfe1 in line: derphi0 = np.dot(gfk, pk). Here, according to my tests, scipy.optimize.optimize pk = [[ 12.00921659] [ 11.26284221]]pk type = and scipy.optimize.optimizegfk = [[-12.00921659] [-11.26284221]]gfk type = Note: according to my tests, the error is thrown on the very first iteration through fmin_bfgs (i.e., fmin_bfgs never even completes a single iteration or update).

I appreciate ANY guidance or insights.

My Code Below (logging, documentation removed): Assume theta = 2x1 ndarray (Actual: theta Info Size=(2, 1) Type = ) Assume X = 100x2 ndarray (Actual: X Info Size=(2, 100) Type = ) Assume y = 100x1 ndarray (Actual: y Info Size=(100, 1) Type = )

def cost_arr(self, theta, X, y):

    theta = scipy.resize(theta,(2,1))         

    m = scipy.shape(X)

    m = 1 / m[1] # Use m[1] because this is the length of X
    logging.info(__name__ + "cost_arr reports m = " + str(m))         

    z = scipy.dot(theta.T, X) # Must transpose the vector theta               

    hypthetax = self.sigmoid(z)

    yones = scipy.ones(scipy.shape(y))

    hypthetaxones = scipy.ones(scipy.shape(hypthetax))

    costright = scipy.dot((yones - y).T, ((scipy.log(hypthetaxones - hypthetax)).T))

    costleft = scipy.dot((-1 * y).T, ((scipy.log(hypthetax)).T))


def gradient_descent_arr(self, theta, X, y):

    theta = scipy.resize(theta,(2,1)) 

    m = scipy.shape(X)

    m = 1 / m[1] # Use m[1] because this is the length of X

    x = scipy.dot(theta.T, X) # Must transpose the vector theta

    sig = self.sigmoid(x)

    sig = sig.T - y

    grad = scipy.dot(X,sig)

    grad = m * grad

    return grad

def fminunc_opt_bfgs(self, initialtheta, X, y, maxnumit):
    myargs= (X,y)

    optcost = scipy.optimize.fmin_bfgs(self.cost_arr, initialtheta, fprime=self.gradient_descent_arr, args=myargs, maxiter=maxnumit, retall=True, full_output=True)

    return optcost

Answer

SaB picture SaB · Jan 27, 2012

In case anyone else encounters this problem ....

1) ERROR 1: As noted in the comments, I incorrectly returned the value from my gradient as a multidimensional array (m,n) or (m,1). fmin_bfgs seems to require a 1d array output from the gradient (that is, you must return a (m,) array and NOT a (m,1) array. Use scipy.shape(myarray) to check the dimensions if you are unsure of the return value.

The fix involved adding:

grad = numpy.ndarray.flatten(grad)

just before returning the gradient from your gradient function. This "flattens" the array from (m,1) to (m,). fmin_bfgs can take this as input.

2) ERROR 2: Remember, the fmin_bfgs seems to work with NONlinear functions. In my case, the sample that I was initially working with was a LINEAR function. This appears to explain some of the anomalous results even after the flatten fix mentioned above. For LINEAR functions, fmin, rather than fmin_bfgs, may work better.

QED