Kalman filter in computer vision: the choice of Q and R noise covariances

cyberdyne picture cyberdyne · Jan 20, 2014 · Viewed 9.4k times · Source

I read some works about Kalman filter for CV object tracking but I can't find some reference about the choice of: 1)the process noise covariance Q; 2)Measurement noise covariance R. So far I have realized that the model is equation of motion (someone uses acceleration as state variable, others use position and speed only) but nobody is clear about Q and R choice including this example by mathworks: http://www.mathworks.it/it/help/vision/examples/using-kalman-filter-for-object-tracking.html Recently I found this page: http://blog.cordiner.net/2011/05/03/object-tracking-using-a-kalman-filter-matlab/ but the Q and R assignment is not clear. Does anyone know help me, please ?

Answer

Dima picture Dima · Jan 21, 2014

R is the covariance matrix of the measurement noise, assumed to be Gaussian. In the context of tracking objects in video it means your detection error. Let's say you are using a face detector to detect faces, and then you want to track them using the Kalman filter. You run the detector, you get a bounding box for each face, and then you use the Kalman filter to track the centroid of each box. The R matrix must describe how uncertain you are about the location of the centroid. So in this case for the x,y coordinates the corresponding diagonal values of R should be a few pixels. If your state includes velocity, then you need to guess the uncertainty of the velocity measurement, and take the units into account. If your position is measured in pixels and your velocity in pixels per frame, then the diagonal entries of R must reflect that.

Q is the covariance of the process noise. Simply put, Q specifies how much the actual motion of the object deviates from your assumed motion model. If you are tracking cars on a road, then the constant velocity model should be reasonably good, and the entries of Q should be small. If you are tracking people's faces, they are not likely to move with a constant velocity, so you need to crank up Q. Again, you need to be aware of the units in which your state variables are expressed.

So this is the intuition. In practice you start with some reasonable initial guess for R and Q, and then you tune them experimentally. So setting R and Q is a bit of an art. Also, in most cases using diagonal matrices for R and Q is sufficient.

Here is an example that uses the vision.KalmanFilter in Matalb for tracking multiple people.