Is there a field in which PDF files specify their encoding?

Louis Thibault picture Louis Thibault · May 18, 2012 · Viewed 69.7k times · Source

I understand that it is impossible to determine the character encoding of any stringform data just by looking at the data. This is not my question.

My question is: Is there a field in a PDF file where, by convention, the encoding scheme is specified (e.g.: UTF-8)? This would be something roughly analogous to <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> in HTML.

Thank you very much in advance, Blz

Answer

Mattias Wadman picture Mattias Wadman · May 18, 2012

A quick look at the PDF specification seems to suggest that you can have different encoding inside a PDF-file. Have a look at page 86. So a PDF library with some kind of low level access should be able to provide you with encoding used for a string. But if you just want the text and don't care about the internal encodings used I would suggest to let the library take care of conversions for you.