OpenCV: reading frames from VideoCapture advances the video to bizarrely wrong location

timgeb picture timgeb · Jun 11, 2017 · Viewed 9.2k times · Source

(I will put a 500 reputation bounty on this question as soon as it's eligible - unless the question got closed.)

Problem in one sentence

Reading frames from a VideoCapture advances the video much further than it's supposed to.

Explanation

I need to read and analyze frames from a 100 fps (according to cv2 and VLC media player) video between certain time-intervals. In the minimal example that follows I am trying to read all the frames for the first ten seconds of a three minute video.

I am creating a cv2.VideoCapture object from which I read frames until the desired position in milliseconds is reached. In my actual code each frame is analyzed, but that fact is irrelevant in order to showcase the error.

Checking the current frame and millisecond position of the VideoCapture after reading the frames yields correct values, so the VideoCapture thinks it is at the right position - but it is not. Saving an image of the last read frame reveals that my iteration is grossly overshooting the destination time by over two minutes.

What's even more bizarre is that if I manually set the millisecond position of the capture with VideoCapture.set to 10 seconds (the same value VideoCapture.get returns after reading the frames) and save an image, the video is at (almost) the right position!

Demo video file

In case you want to run the MCVE, you need the demo.avi video file. You can download it HERE.

MCVE

This MCVE is carefully crafted and commented. Please leave a comment under the question if anything remains unclear.

If you are using OpenCV 3 you have to replace all instances of cv2.cv.CV_ with cv2.. (The problem occurs in both versions for me.)

import cv2

# set up capture and print properties
print 'cv2 version = {}'.format(cv2.__version__)
cap = cv2.VideoCapture('demo.avi')
fps = cap.get(cv2.cv.CV_CAP_PROP_FPS)
pos_msec = cap.get(cv2.cv.CV_CAP_PROP_POS_MSEC)
pos_frames = cap.get(cv2.cv.CV_CAP_PROP_POS_FRAMES)
print ('initial attributes: fps = {}, pos_msec = {}, pos_frames = {}'
      .format(fps, pos_msec, pos_frames))

# get first frame and save as picture
_, frame = cap.read()
cv2.imwrite('first_frame.png', frame)

# advance 10 seconds, that's 100*10 = 1000 frames at 100 fps
for _ in range(1000):
    _, frame = cap.read()
    # in the actual code, the frame is now analyzed

# save a picture of the current frame
cv2.imwrite('after_iteration.png', frame)

# print properties after iteration
pos_msec = cap.get(cv2.cv.CV_CAP_PROP_POS_MSEC)
pos_frames = cap.get(cv2.cv.CV_CAP_PROP_POS_FRAMES)
print ('attributes after iteration: pos_msec = {}, pos_frames = {}'
      .format(pos_msec, pos_frames))

# assert that the capture (thinks it) is where it is supposed to be
# (assertions succeed)
assert pos_frames == 1000 + 1 # (+1: iteration started with second frame)
assert pos_msec == 10000 + 10

# manually set the capture to msec position 10010
# note that this should change absolutely nothing in theory
cap.set(cv2.cv.CV_CAP_PROP_POS_MSEC, 10010)

# print properties  again to be extra sure
pos_msec = cap.get(cv2.cv.CV_CAP_PROP_POS_MSEC)
pos_frames = cap.get(cv2.cv.CV_CAP_PROP_POS_FRAMES)
print ('attributes after setting msec pos manually: pos_msec = {}, pos_frames = {}'
      .format(pos_msec, pos_frames))

# save a picture of the next frame, should show the same clock as
# previously taken image - but does not
_, frame = cap.read()
cv2.imwrite('after_setting.png', frame)

MCVE output

The print statements produce the following output.

cv2 version = 2.4.9.1
initial attributes: fps = 100.0, pos_msec = 0.0, pos_frames = 0.0
attributes after reading: pos_msec = 10010.0, pos_frames = 1001.0
attributes after setting msec pos manually: pos_msec = 10010.0, pos_frames = 1001.0

As you can see, all properties have the expected values.

imwrite saves the following pictures.

first_frame.png first_frame.png

after_iteration.png after_iteration.png

after_setting.png after_setting.png

You can see the problem in the second picture. The target of 9:26:15 (real time clock in picture) is missed by over two minutes. Setting the target time manually (third picture) sets the video to (almost) the correct position.

What am I doing wrong and how do I fix it?

Tried so far

cv2 2.4.9.1 @ Ubuntu 16.04
cv2 2.4.13 @ Scientific Linux 7.3 (three computers)
cv2 3.1.0 @ Scientific Linux 7.3 (three computers)

Creating the capture with

cap = cv2.VideoCapture('demo.avi', apiPreference=cv2.CAP_FFMPEG)

and

cap = cv2.VideoCapture('demo.avi', apiPreference=cv2.CAP_GSTREAMER)

in OpenCV 3 (version 2 does not seem to have the apiPreference argument). Using cv2.CAP_GSTREAMER takes extremely long (about 2-3 minutes to run the MCVE) but both api-preferences produce the same incorrect images.

When using ffmpeg directly to read frames (credit to this tutorial) the correct output images are produced.

import numpy as np
import subprocess as sp
import pylab

# video properties
path = './demo.avi'
resolution = (593, 792)
framesize = resolution[0]*resolution[1]*3

# set up pipe
FFMPEG_BIN = "ffmpeg"
command = [FFMPEG_BIN,
           '-i', path,
           '-f', 'image2pipe',
           '-pix_fmt', 'rgb24',
           '-vcodec', 'rawvideo', '-']
pipe = sp.Popen(command, stdout = sp.PIPE, bufsize=10**8)

# read first frame and save as image
raw_image = pipe.stdout.read(framesize)
image = np.fromstring(raw_image, dtype='uint8')
image = image.reshape(resolution[0], resolution[1], 3)
pylab.imshow(image)
pylab.savefig('first_frame_ffmpeg_only.png')
pipe.stdout.flush()

# forward 1000 frames
for _ in range(1000):
    raw_image = pipe.stdout.read(framesize)
    pipe.stdout.flush()

# save frame 1001
image = np.fromstring(raw_image, dtype='uint8')
image = image.reshape(resolution[0], resolution[1], 3)
pylab.imshow(image)
pylab.savefig('frame_1001_ffmpeg_only.png')

pipe.terminate()

This produces the correct result! (Correct timestamp 9:26:15)

frame_1001_ffmpeg_only.png: frame_1001_ffmpeg_only.png

Additional information

In the comments I was asked for my cvconfig.h file. I only seem to have this file for cv2 version 3.1.0 under /opt/opencv/3.1.0/include/opencv2/cvconfig.h.

HERE is a paste of this file.

In case it helps, I was able to extract the following video information with VideoCapture.get.

brightness 0.0
contrast 0.0
convert_rgb 0.0
exposure 0.0
format 0.0
fourcc 1684633187.0
fps 100.0
frame_count 18000.0
frame_height 593.0
frame_width 792.0
gain 0.0
hue 0.0
mode 0.0
openni_baseline 0.0
openni_focal_length 0.0
openni_frame_max_depth 0.0
openni_output_mode 0.0
openni_registration 0.0
pos_avi_ratio 0.01
pos_frames 0.0
pos_msec 0.0
rectification 0.0
saturation 0.0

Answer

Leon picture Leon · Jun 14, 2017

Your video file data contains just 1313 non-duplicate frames (i.e. between 7 and 8 frames per second of duration):

$ ffprobe -i demo.avi -loglevel fatal -show_streams -count_frames|grep frame
has_b_frames=0
r_frame_rate=100/1
avg_frame_rate=100/1
nb_frames=18000
nb_read_frames=1313        # !!!

Converting the avi file with ffmpeg reports 16697 duplicate frames (for some reason 10 additional frames are added and 16697=18010-1313).

$ ffmpeg -i demo.avi demo.mp4
...
frame=18010 fps=417 Lsize=3705kB time=03:00.08 bitrate=168.6kbits/s dup=16697
#                                                                   ^^^^^^^^^
...

BTW, thus converted video (demo.mp4) is devoid of the problem being discussed, that is OpenCV processes it correctly.

In this case the duplicate frames are not physically present in the avi file, instead each duplicate frame is represented by an instruction to repeat the previous frame. This can be checked as follows:

$ ffplay -loglevel trace demo.avi
...
[ffplay_crop @ 0x7f4308003380] n:16 t:2.180000 pos:1311818.000000 x:0 y:0 x+w:792 y+h:592
[avi @ 0x7f4310009280] dts:574 offset:574 1/100 smpl_siz:0 base:1000000 st:0 size:81266
video: delay=0.130 A-V=0.000094
    Last message repeated 9 times
video: delay=0.130 A-V=0.000095
video: delay=0.130 A-V=0.000094
video: delay=0.130 A-V=0.000095
[avi @ 0x7f4310009280] dts:587 offset:587 1/100 smpl_siz:0 base:1000000 st:0 size:81646
[ffplay_crop @ 0x7f4308003380] n:17 t:2.320000 pos:1393538.000000 x:0 y:0 x+w:792 y+h:592
video: delay=0.140 A-V=0.000091
    Last message repeated 4 times
video: delay=0.140 A-V=0.000092
    Last message repeated 1 times
video: delay=0.140 A-V=0.000091
    Last message repeated 6 times
...

In the above log, frames with actual data are represented by the lines starting with "[avi @ 0xHHHHHHHHHHH]". The "video: delay=xxxxx A-V=yyyyy" messages indicate that the last frame must be displayed for xxxxx more seconds.

cv2.VideoCapture() skips such duplicate frames, reading only frames that have real data. Here is the corresponding (though, slightly edited) code from the 2.4 branch of opencv (note, BTW, that underneath ffmpeg is used, which I verified by running python under gdb and setting a breakpoint on CvCapture_FFMPEG::grabFrame):

bool CvCapture_FFMPEG::grabFrame()
{
    ...
    int count_errs = 0;
    const int max_number_of_attempts = 1 << 9; // !!!
    ...
    // get the next frame
    while (!valid)
    {
        ...
        int ret = av_read_frame(ic, &packet);
        ...        
        // Decode video frame
        avcodec_decode_video2(video_st->codec, picture, &got_picture, &packet);
        // Did we get a video frame?
        if(got_picture)
        {
            //picture_pts = picture->best_effort_timestamp;
            if( picture_pts == AV_NOPTS_VALUE_ )
                picture_pts = packet.pts != AV_NOPTS_VALUE_ && packet.pts != 0 ? packet.pts : packet.dts;
            frame_number++;
            valid = true;
        }
        else
        {
            // So, if the next frame doesn't have picture data but is
            // merely a tiny instruction telling to repeat the previous
            // frame, then we get here, treat that situation as an error
            // and proceed unless the count of errors exceeds 1 billion!!!
            if (++count_errs > max_number_of_attempts)
                break;
        }
    }
    ...
}