Lucas Kanade python numpy implementation uses enormous amount of memory

mpiekarz picture mpiekarz · Jan 14, 2013 · Viewed 8.4k times · Source

I was working on Optical Flow script using Lucas Kanade method, as University project. While it works well, there is something I can't figure out. It uses few MB of memory at start, but that amount increases rapidly every second. By the time it computes OF for 1 frame of 480p movie, it uses about 1GB. When it reaches 1.9GB, it suddenly stops and stays there even if left for few hours.

I tried running the script on a different PC and on it, it 'only' uses 1GB.

This is really odd behaviour as, according to my calculations, it should use much less than 100MB.

The most suprising thing for me was, that after script computes one frame, I printed amount of objects that garbage collector is watching and it was around 2 million, then printed it again after forcing collection and it was exactly the same. I waited for 2nd frame to be computed (meantime memory usage raised by ~1GB) and script printed the number of objects being watched by GC - exactly the same number close to 2 million. So what does it mean? That numpy is written in C and has memory leaks?

I would really like to understand this behaviour.

Here is the code: http://pastebin.com/WSi7akY4

Answer

Jaime picture Jaime · Jan 14, 2013

While it doesn't explain your memory issues, your implementation is, to put it mildly, suboptimal. Not only are you not using numpy to its fullest capabilities, but the flow of your algorithm is also not very good at avoiding repeated calculations. I think you are simply running your system out of resources, not because something is wrong in python or numpy, but because you are creating way too many unnecessary lists of lists of lists...

After taking a look at wikipedia's entry on the Lucas-Kanade algorithm, I rewrote your main function as follows:

def lucas_kanade_np(im1, im2, win=2):
    assert im1.shape == im2.shape
    I_x = np.zeros(im1.shape)
    I_y = np.zeros(im1.shape)
    I_t = np.zeros(im1.shape)
    I_x[1:-1, 1:-1] = (im1[1:-1, 2:] - im1[1:-1, :-2]) / 2
    I_y[1:-1, 1:-1] = (im1[2:, 1:-1] - im1[:-2, 1:-1]) / 2
    I_t[1:-1, 1:-1] = im1[1:-1, 1:-1] - im2[1:-1, 1:-1]
    params = np.zeros(im1.shape + (5,)) #Ix2, Iy2, Ixy, Ixt, Iyt
    params[..., 0] = I_x * I_x # I_x2
    params[..., 1] = I_y * I_y # I_y2
    params[..., 2] = I_x * I_y # I_xy
    params[..., 3] = I_x * I_t # I_xt
    params[..., 4] = I_y * I_t # I_yt
    del I_x, I_y, I_t
    cum_params = np.cumsum(np.cumsum(params, axis=0), axis=1)
    del params
    win_params = (cum_params[2 * win + 1:, 2 * win + 1:] -
                  cum_params[2 * win + 1:, :-1 - 2 * win] -
                  cum_params[:-1 - 2 * win, 2 * win + 1:] +
                  cum_params[:-1 - 2 * win, :-1 - 2 * win])
    del cum_params
    op_flow = np.zeros(im1.shape + (2,))
    det = win_params[...,0] * win_params[..., 1] - win_params[..., 2] **2
    op_flow_x = np.where(det != 0,
                         (win_params[..., 1] * win_params[..., 3] -
                          win_params[..., 2] * win_params[..., 4]) / det,
                         0)
    op_flow_y = np.where(det != 0,
                         (win_params[..., 0] * win_params[..., 4] -
                          win_params[..., 2] * win_params[..., 3]) / det,
                         0)
    op_flow[win + 1: -1 - win, win + 1: -1 - win, 0] = op_flow_x[:-1, :-1]
    op_flow[win + 1: -1 - win, win + 1: -1 - win, 1] = op_flow_y[:-1, :-1]
    return op_flow

It uses two nested calls to np.cumsum and the exclusion-inclusion principle to compute the windowed parameters. Since the system of equations to solve at each point is only 2x2, it uses Cramer's rule, to vectorize the solving.

For comparison, I renamed your lucas_kanade function as lucas_kanade_op with a single change to the last statement, so that it returns a numpy array:

def lucas_kanade_op(im1, im2, win=2) :
    ...
    return np.array(opfl)

I timed both approaches, (and checked that they both output the same) and no surprises, taking advantage of numpy provides a tremendous boost:

rows, cols = 100, 100
im1 = np.random.rand(rows, cols)
im2 = np.random.rand(rows, cols)
ans1 = lucas_kanade_op(im1, im2)
ans2 = lucas_kanade_np(im1, im2)
np.testing.assert_almost_equal(ans1,ans2)

import timeit
print 'op\'s time:', timeit.timeit('lucas_kanade_op(im1, im2)',
                                   'from __main__ import lucas_kanade_op, im1, im2',
                                   number=1)
print 'np\'s time:', timeit.timeit('lucas_kanade_np(im1, im2)',
                                   'from __main__ import lucas_kanade_np, im1, im2',
                                   number=1)

This prints out:

op's time: 5.7419579567
np's time: 0.00256002154425

So that's a x2000 speed increase, for a smallish 100x100 image. I did not dare test your approach for a full-size 480p image, but the function above can handle around 5 calculations on a random 854x480 array per second, with no issues whatsoever.

I'd recommend you rewrite your code in a way similar to what is proposed above, taking advantage of numpy to its fullest. Posting your full code to Code Review would be a good starting point. But there really is no point in looking for stray references to objects when your code is so inefficient to begin with!