(I will put a 500 reputation bounty on this question as soon as it's eligible - unless the question got closed.)
Problem in one sentence
Reading frames from a VideoCapture
advances the video much further than it's supposed to.
Explanation
I need to read and analyze frames from a 100 fps (according to cv2
and VLC media player) video between certain time-intervals. In the minimal example that follows I am trying to read all the frames for the first ten seconds of a three minute video.
I am creating a cv2.VideoCapture
object from which I read frames until the desired position in milliseconds is reached. In my actual code each frame is analyzed, but that fact is irrelevant in order to showcase the error.
Checking the current frame and millisecond position of the VideoCapture
after reading the frames yields correct values, so the VideoCapture
thinks it is at the right position - but it is not. Saving an image of the last read frame reveals that my iteration is grossly overshooting the destination time by over two minutes.
What's even more bizarre is that if I manually set the millisecond position of the capture with VideoCapture.set
to 10 seconds (the same value VideoCapture.get
returns after reading the frames) and save an image, the video is at (almost) the right position!
Demo video file
In case you want to run the MCVE, you need the demo.avi video file. You can download it HERE.
MCVE
This MCVE is carefully crafted and commented. Please leave a comment under the question if anything remains unclear.
If you are using OpenCV 3 you have to replace all instances of cv2.cv.CV_
with cv2.
. (The problem occurs in both versions for me.)
import cv2
# set up capture and print properties
print 'cv2 version = {}'.format(cv2.__version__)
cap = cv2.VideoCapture('demo.avi')
fps = cap.get(cv2.cv.CV_CAP_PROP_FPS)
pos_msec = cap.get(cv2.cv.CV_CAP_PROP_POS_MSEC)
pos_frames = cap.get(cv2.cv.CV_CAP_PROP_POS_FRAMES)
print ('initial attributes: fps = {}, pos_msec = {}, pos_frames = {}'
.format(fps, pos_msec, pos_frames))
# get first frame and save as picture
_, frame = cap.read()
cv2.imwrite('first_frame.png', frame)
# advance 10 seconds, that's 100*10 = 1000 frames at 100 fps
for _ in range(1000):
_, frame = cap.read()
# in the actual code, the frame is now analyzed
# save a picture of the current frame
cv2.imwrite('after_iteration.png', frame)
# print properties after iteration
pos_msec = cap.get(cv2.cv.CV_CAP_PROP_POS_MSEC)
pos_frames = cap.get(cv2.cv.CV_CAP_PROP_POS_FRAMES)
print ('attributes after iteration: pos_msec = {}, pos_frames = {}'
.format(pos_msec, pos_frames))
# assert that the capture (thinks it) is where it is supposed to be
# (assertions succeed)
assert pos_frames == 1000 + 1 # (+1: iteration started with second frame)
assert pos_msec == 10000 + 10
# manually set the capture to msec position 10010
# note that this should change absolutely nothing in theory
cap.set(cv2.cv.CV_CAP_PROP_POS_MSEC, 10010)
# print properties again to be extra sure
pos_msec = cap.get(cv2.cv.CV_CAP_PROP_POS_MSEC)
pos_frames = cap.get(cv2.cv.CV_CAP_PROP_POS_FRAMES)
print ('attributes after setting msec pos manually: pos_msec = {}, pos_frames = {}'
.format(pos_msec, pos_frames))
# save a picture of the next frame, should show the same clock as
# previously taken image - but does not
_, frame = cap.read()
cv2.imwrite('after_setting.png', frame)
MCVE output
The print
statements produce the following output.
cv2 version = 2.4.9.1
initial attributes: fps = 100.0, pos_msec = 0.0, pos_frames = 0.0
attributes after reading: pos_msec = 10010.0, pos_frames = 1001.0
attributes after setting msec pos manually: pos_msec = 10010.0, pos_frames = 1001.0
As you can see, all properties have the expected values.
imwrite
saves the following pictures.
You can see the problem in the second picture. The target of 9:26:15 (real time clock in picture) is missed by over two minutes. Setting the target time manually (third picture) sets the video to (almost) the correct position.
What am I doing wrong and how do I fix it?
Tried so far
cv2 2.4.9.1 @ Ubuntu 16.04
cv2 2.4.13 @ Scientific Linux 7.3 (three computers)
cv2 3.1.0 @ Scientific Linux 7.3 (three computers)
Creating the capture with
cap = cv2.VideoCapture('demo.avi', apiPreference=cv2.CAP_FFMPEG)
and
cap = cv2.VideoCapture('demo.avi', apiPreference=cv2.CAP_GSTREAMER)
in OpenCV 3 (version 2 does not seem to have the apiPreference
argument).
Using cv2.CAP_GSTREAMER
takes extremely long (about 2-3 minutes to run the MCVE) but both api-preferences produce the same incorrect images.
When using ffmpeg
directly to read frames (credit to this tutorial) the correct output images are produced.
import numpy as np
import subprocess as sp
import pylab
# video properties
path = './demo.avi'
resolution = (593, 792)
framesize = resolution[0]*resolution[1]*3
# set up pipe
FFMPEG_BIN = "ffmpeg"
command = [FFMPEG_BIN,
'-i', path,
'-f', 'image2pipe',
'-pix_fmt', 'rgb24',
'-vcodec', 'rawvideo', '-']
pipe = sp.Popen(command, stdout = sp.PIPE, bufsize=10**8)
# read first frame and save as image
raw_image = pipe.stdout.read(framesize)
image = np.fromstring(raw_image, dtype='uint8')
image = image.reshape(resolution[0], resolution[1], 3)
pylab.imshow(image)
pylab.savefig('first_frame_ffmpeg_only.png')
pipe.stdout.flush()
# forward 1000 frames
for _ in range(1000):
raw_image = pipe.stdout.read(framesize)
pipe.stdout.flush()
# save frame 1001
image = np.fromstring(raw_image, dtype='uint8')
image = image.reshape(resolution[0], resolution[1], 3)
pylab.imshow(image)
pylab.savefig('frame_1001_ffmpeg_only.png')
pipe.terminate()
This produces the correct result! (Correct timestamp 9:26:15)
Additional information
In the comments I was asked for my cvconfig.h
file. I only seem to have this file for cv2 version 3.1.0 under /opt/opencv/3.1.0/include/opencv2/cvconfig.h
.
HERE is a paste of this file.
In case it helps, I was able to extract the following video information with VideoCapture.get
.
brightness 0.0
contrast 0.0
convert_rgb 0.0
exposure 0.0
format 0.0
fourcc 1684633187.0
fps 100.0
frame_count 18000.0
frame_height 593.0
frame_width 792.0
gain 0.0
hue 0.0
mode 0.0
openni_baseline 0.0
openni_focal_length 0.0
openni_frame_max_depth 0.0
openni_output_mode 0.0
openni_registration 0.0
pos_avi_ratio 0.01
pos_frames 0.0
pos_msec 0.0
rectification 0.0
saturation 0.0
Your video file data contains just 1313 non-duplicate frames (i.e. between 7 and 8 frames per second of duration):
$ ffprobe -i demo.avi -loglevel fatal -show_streams -count_frames|grep frame
has_b_frames=0
r_frame_rate=100/1
avg_frame_rate=100/1
nb_frames=18000
nb_read_frames=1313 # !!!
Converting the avi file with ffmpeg
reports 16697 duplicate frames (for some reason 10 additional frames are added and 16697=18010-1313).
$ ffmpeg -i demo.avi demo.mp4
...
frame=18010 fps=417 Lsize=3705kB time=03:00.08 bitrate=168.6kbits/s dup=16697
# ^^^^^^^^^
...
BTW, thus converted video (
demo.mp4
) is devoid of the problem being discussed, that is OpenCV processes it correctly.
In this case the duplicate frames are not physically present in the avi file, instead each duplicate frame is represented by an instruction to repeat the previous frame. This can be checked as follows:
$ ffplay -loglevel trace demo.avi
...
[ffplay_crop @ 0x7f4308003380] n:16 t:2.180000 pos:1311818.000000 x:0 y:0 x+w:792 y+h:592
[avi @ 0x7f4310009280] dts:574 offset:574 1/100 smpl_siz:0 base:1000000 st:0 size:81266
video: delay=0.130 A-V=0.000094
Last message repeated 9 times
video: delay=0.130 A-V=0.000095
video: delay=0.130 A-V=0.000094
video: delay=0.130 A-V=0.000095
[avi @ 0x7f4310009280] dts:587 offset:587 1/100 smpl_siz:0 base:1000000 st:0 size:81646
[ffplay_crop @ 0x7f4308003380] n:17 t:2.320000 pos:1393538.000000 x:0 y:0 x+w:792 y+h:592
video: delay=0.140 A-V=0.000091
Last message repeated 4 times
video: delay=0.140 A-V=0.000092
Last message repeated 1 times
video: delay=0.140 A-V=0.000091
Last message repeated 6 times
...
In the above log, frames with actual data are represented by the lines starting with "[avi @ 0xHHHHHHHHHHH]
". The "video: delay=xxxxx A-V=yyyyy
" messages indicate that the last frame must be displayed for xxxxx
more seconds.
cv2.VideoCapture()
skips such duplicate frames, reading only frames that have real data. Here is the corresponding (though, slightly edited) code from the 2.4 branch of opencv (note, BTW, that underneath ffmpeg is used, which I verified by running python under gdb and setting a breakpoint on CvCapture_FFMPEG::grabFrame
):
bool CvCapture_FFMPEG::grabFrame()
{
...
int count_errs = 0;
const int max_number_of_attempts = 1 << 9; // !!!
...
// get the next frame
while (!valid)
{
...
int ret = av_read_frame(ic, &packet);
...
// Decode video frame
avcodec_decode_video2(video_st->codec, picture, &got_picture, &packet);
// Did we get a video frame?
if(got_picture)
{
//picture_pts = picture->best_effort_timestamp;
if( picture_pts == AV_NOPTS_VALUE_ )
picture_pts = packet.pts != AV_NOPTS_VALUE_ && packet.pts != 0 ? packet.pts : packet.dts;
frame_number++;
valid = true;
}
else
{
// So, if the next frame doesn't have picture data but is
// merely a tiny instruction telling to repeat the previous
// frame, then we get here, treat that situation as an error
// and proceed unless the count of errors exceeds 1 billion!!!
if (++count_errs > max_number_of_attempts)
break;
}
}
...
}