I've seen endpoint error (EPE) used as a metric for determining how close a flow estimation is to a ground truth flow, but I have a few questions about it and was hoping someone could enlighten me:
End-to-end point error is calculated by comparing an estimated optical flow vector ( ) with a groundtruth optical flow vector ( ).
End-to-end point error is defined as the Euclidean distance between these two:
For a given frame in the video, you will usually have many such vectors, and the common quality measure of your optical flow estimation is the average end-to-end point error.
Note that you need annotated video with groundtruth, or you cannot calculate the measure. The classical datasets to use are the Middlebury Optical flow sets. For a long rich dataset with such groundtruth (albeit rendered), see for example the MPI Sintel Dataset
Another common error measure is the interpolation error. It has the benefit of not needing any groundtruth. Interpolation error is achieved by using the optical flow to extrapolate ("warp") the current frame. The extrapolated image is then compared with the real next frame of the video.
Interpolation error can be a good measure for how well the optical flow can be used for video encoding, while end-to-end point error can be a good measure for how it can be used for computer vision tasks, such as shape from motion and the likes.