To track object on video frame, first of all I extract image frames from video and save those images to a folder. Then I am supposed to process those images to find an object. Actually I do not know if this is a practical thing, because all the algorithm did this for one step. Is this correct?
Well, your approach will consume a lot of space on your disk depending on the size of the video and the size of the frames, plus you will spend a considerable amount of time reading frames from the disk.
Have you tried to perform real-time video processing instead? If your algorithm is not too slow, there are some posts that show the things that you need to do:
I trust you are capable of converting code from the C interface to the C++ interface.