I want to understand how video and audio decoding works, specially the timing synchronization (how to get 30fps video, how to couple that with audio, etc.). I don't want to know ALL the details, just the essence of it. I want to be able to write a high level simplification of an actual video/audio decoder.
Could you provide pointers to me? An actual C/C++ source code of a MPEG2 video/audio decoder would be the fastest way to understand those things I think.
Reading source code from a codec that works seems the right way to go. I suggest the following :
http://www.mpeg.org/MPEG/video/mssg-free-mpeg-software.html
Given that it's mentionned on the mpeg.org website, i'd say you'll find what you need here.
In the past i've had some time to work on decoding mpeg videos (no audio though), and the principles are quite simple. There are some pure images included, some intermediary images that are described relatively to the closest main ones, and the rest are described using the closest main/intermediary images.
One time slot, one image. But recent codecs are much more complicated, I guess !
EDIT : synchronization
I am no expert in synchronizing audio and video, but the issue seems to be dealt with using a sync layer (see there for a definition).