I am totally understand about the size of the NV12 format as described in question
Now I am reading from two sources about the storage of UV plane in this format: one is https://msdn.microsoft.com/en-us/library/windows/desktop/dd206750(v=vs.85).aspx
NV12
All of the Y samples appear first in memory as an array of unsigned char values with an even number of lines. The Y plane is followed immediately by an array of unsigned char values that contains packed U (Cb) and V (Cr) samples. When the combined U-V array is addressed as an array of little-endian WORD values, the LSBs contain the U values, and the MSBs contain the V values. NV12 is the preferred 4:2:0 pixel format for DirectX VA. It is expected to be an intermediate-term requirement for DirectX VA accelerators supporting 4:2:0 video. The following illustration shows the Y plane and the array that contains packed U and V samples.
What I understand is: in UV plane each U and V are stored in single byte.
When I read from wikipedia about this: https://wiki.videolan.org/YUV#NV12
It says:
NV12
Related to I420, NV12 has one luma "luminance" plane Y and one plane with U and V values interleaved. In NV12, chroma planes (blue and red) are subsampled in both the horizontal and vertical dimensions by a factor of 2. For a 2x2 group of pixels, you have 4 Y samples and 1 U and 1 V sample. It can be helpful to think of NV12 as I420 with the U and V planes interleaved. Here is a graphical representation of NV12. Each letter represents one bit: For 1 NV12 pixel: YYYYYYYY UVUV For a 2-pixel NV12 frame: YYYYYYYYYYYYYYYY UVUVUVUV For a 50-pixel NV12 frame: Y*8*50 (UV)*2*50 For a n-pixel NV12 frame: Y*8*n (UV)*2*n
What I understand here is : each U and V are interleaved bit by bit in each byte. So each each byte of UV plane will contain 4U bits and 4V bits interleaved.
Can anyone clarify my doubt?
To verify this (or at least verify that there is no interleaving on bit level), one can use ffmpeg
, which is a widely used video tool. I did the following experiment:
ffmpeg
to read it as a I420
video frame of some small sizeffmpeg
to convert it to NV12
formatHere is an example commandline for (2) and (3):
ffmpeg -s 96x4 -i example_i420.yuv -pix_fmt nv12 example_nv12.yuv
Here is what I got in the output:
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sutnett uirn acduilppias cqiunig oeflfiitc,i as edde sdeor uenitu smmooldl itte mapnoirm iindc iedsitd ulnatb ourtu ml.a bLoorree me ti pdsoulmo rdeo lmoarg nsai ta laimqeuta,. cUotn seenci
I marked the chroma (U and V) samples in bold. It is evident that these are the same values (ASCII letters), just in scrambled order. If any bit-interleaving were performed, I would get different values.
So the description in the VLC wiki (BTW it's not Wikipedia) is incorrect. Someone with the name "Edwardw" added the "illustration" mentioning pixels here, and later changed it to "bits" here. I hope someone changes it to be less misleading (the wiki requires registration so I cannot edit it).