Text encoding in ID3v2.3 tags

unicode encoding hex ascii id3

phanteh · Mar 25, 2012 · Viewed 9.2k times · Source

Thanks to this site and a few others, I've created some simple code to read ID3v2.3 tags from MP3 files. Doing so has been a great learning experience as I previously had no knowledge of hex / byte / binary etc.

I can successfully read data, but have come across an issue that I believe is to do with encoding used. I've realized that Text frames have a byte at the beginning of the 'text' that describes encoding used, and potentially more information in the next 2 bytes...

Example: Data from frame TIT2 starts with the byte $03 (hex) before the actual text. This text displays correctly, albeit with an additional character at the beginning, using Encoding.ASCII.GetString

In another MP3, data from TIT2 starts $01 and is followed by $FF $FE, which I believe is to do with Unicode? The text itself is broken up though, there are $00 between every text character, and this stops the data from being displayed in windows forms (as soon as a 00 is encountered, the text just stops, so I get the first character and that's it). I've tried using Encoding.UNICODE.GetString, but that just seems to return gibberish.

Printing this data to a console seems to work, with spaces between each char, so the reading of the data is working properly.

I've been reading the official documentation for ID3v2.3 but I guess I'm just not clued-up enough to understand the text encoding section.

Any replies or links to articles that may be of help would be much appreciated!

Regards Ross

Answer

Just add one more comment, for the text encoding code:

00 – ISO-8859-1 (ASCII).

01 – UCS-2 (UTF-16 encoded Unicode with BOM), in ID3v2.2 and ID3v2.3.

02 – UTF-16BE encoded Unicode without BOM, in ID3v2.4.

03 – UTF-8 encoded Unicode, in ID3v2.4.

from: http://en.wikipedia.org/wiki/ID3

Text encoding in ID3v2.3 tags

Answer

Related questions