Text encoding in ID3v2.3 tags

phanteh picture phanteh · Mar 25, 2012 · Viewed 9.2k times · Source

Thanks to this site and a few others, I've created some simple code to read ID3v2.3 tags from MP3 files. Doing so has been a great learning experience as I previously had no knowledge of hex / byte / binary etc.

I can successfully read data, but have come across an issue that I believe is to do with encoding used. I've realized that Text frames have a byte at the beginning of the 'text' that describes encoding used, and potentially more information in the next 2 bytes...

Example: Data from frame TIT2 starts with the byte $03 (hex) before the actual text. This text displays correctly, albeit with an additional character at the beginning, using Encoding.ASCII.GetString

In another MP3, data from TIT2 starts $01 and is followed by $FF $FE, which I believe is to do with Unicode? The text itself is broken up though, there are $00 between every text character, and this stops the data from being displayed in windows forms (as soon as a 00 is encountered, the text just stops, so I get the first character and that's it). I've tried using Encoding.UNICODE.GetString, but that just seems to return gibberish.

Printing this data to a console seems to work, with spaces between each char, so the reading of the data is working properly.

I've been reading the official documentation for ID3v2.3 but I guess I'm just not clued-up enough to understand the text encoding section.

Any replies or links to articles that may be of help would be much appreciated!

Regards Ross

Answer

houqp picture houqp · Nov 14, 2012

Just add one more comment, for the text encoding code:

00 – ISO-8859-1 (ASCII).

01 – UCS-2 (UTF-16 encoded Unicode with BOM), in ID3v2.2 and ID3v2.3.

02 – UTF-16BE encoded Unicode without BOM, in ID3v2.4.

03 – UTF-8 encoded Unicode, in ID3v2.4.

from: http://en.wikipedia.org/wiki/ID3