I have a raw H.264 Stream from an IP Camera packed in RTP frames. I want to get raw H.264 data into a file so I can convert it with ffmpeg
.
So when I want to write the data into my raw H.264 file I found out it has to look like this:
00 00 01 [SPS]
00 00 01 [PPS]
00 00 01 [NALByte]
[PAYLOAD RTP Frame 1] // Payload always without the first 2 Bytes -> NAL
[PAYLOAD RTP Frame 2]
[... until PAYLOAD Frame with Mark Bit received] // From here its a new Video Frame
00 00 01 [NAL BYTE]
[PAYLOAD RTP Frame 1]
....
So I get the SPS
and the PPS
from the Session Description Protocol
out of my preceding RTSP
communication. Additionally the camera sends the SPS
and the PPS
in two single messages before starting with the video stream itself.
So I capture the messages in this order:
1. Preceding RTSP Communication here ( including SDP with SPS and PPS )
2. RTP Frame with Payload: 67 42 80 28 DA 01 40 16 C4 // This is the SPS
3. RTP Frame with Payload: 68 CE 3C 80 // This is the PPS
4. RTP Frame with Payload: ... // Video Data
Then there come some Frames with Payload and at some point a RTP Frame with the Marker Bit = 1
. This means ( if I got it right) that I have a complete video frame. Afer this I write the Prefix Sequence ( 00 00 01
) and the NAL
from the payload again and go on with the same procedure.
Now my camera sends me after every 8 complete Video Frames the SPS
and the PPS
again. ( Again in two RTP Frames, as seen in the example above ). I know that especially the PPS
can change in between streaming but that's not the problem.
My questions are now:
1. Do I need to write the SPS/PPS every 8th Video Frame?
If my SPS
and my PPS
don't change it should be enough to have them written at the very beginning of my file and nothing more?
2. How to distinguish between SPS/PPS and normal RTP Frames?
In my C++ Code which parses the transmitted data I need make a difference between the RTP Frames with normal Payload an the ones carrying the SPS/PPS
. How can I distinguish them? Okay the SPS/PPS
frames are usually way smaller, but that's not a save call to rely on. Because if I ignore them I need to know which data I can throw away, or if I need to write them I need to put the 00 00 01
Prefix in front of them. ? Or is it a fixed rule that they occur every 8th Video Frame?
As I remember, nal_unit_type is the lower 5 bits of the 1st byte of a frame.
nal_unit_type = frame[0] & 0x1f;