OpenCV line fitting algorithm

krzych picture krzych · Jul 30, 2012 · Viewed 31.4k times · Source

I'm trying to understand OpenCV fitLine() algorithm.

This is fragment of code from OpenCV: icvFitLine2D function - icvFitLine2D

I see that there is some random function that chooses points for approximation, then computes distances from points to fitted line (with choosen points), then choose other points and tries to minimize distance with choosen distType.

Can someone clarify what happens from this moment without hard mathematics and assuming no great statistic knowledge?. OpenCV code comments and variable names does not help me in understanding this code.

Answer

HugoRune picture HugoRune · Mar 3, 2013

(This is an old question, but the subject piqued my curiosity)

The OpenCV FitLine implemements two different mechanisms.

If the parameter distType is set to CV_DIST_L2, then a standard unweighted least squares fit is used.

If one of the other distTypes is used (CV_DIST_L1, CV_DIST_L12, CV_DIST_FAIR, CV_DIST_WELSCH, CV_DIST_HUBER) then the procedure is some sort of RANSAC fit:

  • Repeat at most 20 times:
    • Pick 10 random points, do a least squares fit only for them
    • Repeat at most 30 times:
  • Return the best found linefit

Here is a more detailed description in pseudocode:

repeat at most 20 times:

    RANSAC (line 371)
     - pick 10 random points, 
     - set their weights to 1, 
     - set all other weights to 0

    least squares weighted fit (fitLine2D_wods, line 381)
     - fit only the 10 picked points to the line, using least-squares

    repeat at most 30 times: (line 382)
     - stop if the difference between the found solution and the previous found solution is less than DELTA  (line 390 - 406)
       (the angle difference must be less than adelta, and the distance beween the line centers must be less than rdelta)
     - stop if the sum of squared distances between the found line and the points is less than EPSILON (line 407)
       (The unweighted sum of squared distances is used here ==> the standard L2 norm)

        re-calculate the weights for *all* points (line 412)
         - using the given norm (CV_DIST_L1 / CV_DIST_L12 / CV_DIST_FAIR / ...)
         - normalize the weights so their sum is 1
         - special case, to catch errors: if for some reason all weights are zero, set all weight to 1

        least squares weighted fit (fitLine2D_wods, line 437)
         - fit *all* points to the line, using weighted least squares

    if the last found solution is better than the current best solution (line 440)
        save it as the new best
        (The unweighted sum of squared distances is used here ==> the standard L2 norm)

        if the distance between the found line and the points is less than EPSILON
             break

return the best solution

The weights are calculated depending on the chosen distType, according to the manual the formula for that is weight[Point_i] = 1/ p(distance_between_point_i_and_line), where p is:

distType=CV_DIST_L1 enter image description here

distType=CV_DIST_L12 enter image description here

distType=CV_DIST_FAIR enter image description here

distType=CV_DIST_WELSCH enter image description here

distType=CV_DIST_HUBER enter image description here

Unfortunately I do not know which distType is best suited for which sort of data, perhaps some else can shed some light on that.


Something interesting I noticed: The chosen norm is only used for the iterative reweighting, the best solution among the found ones is always picked according to the L2 norm (The line for which the unweighted sum of least squares is minimal). I am not sure this is correct.