Binary file IO in python, where to start?

DrBloodmoney picture DrBloodmoney · Jun 9, 2009 · Viewed 12.6k times · Source

As a self-taught python hobbyist, how would I go about learning to import and export binary files using standard formats?

I'd like to implement a script that takes ePub ebooks (XHTML + CSS in a zip) and converts it to a mobipocket (Palmdoc) format in order to allow the Amazon Kindle to read it (as part of a larger project that I'm working on).

There is already an awesome open-source project for managing ebook libraries : Calibre. I wanted to try implementing this on my own as a learning/self-teaching exercise. I started looking at their python source code and realized that I have no idea what is going on. Of course, the big danger in being self-taught at anything is not knowing what you don't know.

In this case, I know that I don't know much about these binary files and how to work with them in python code (struct?). But I think I'm probably missing a lot of knowledge about binary files in general and I'd like some help understanding how to work with them. Here is a detailed overview of the mobi/palmdoc headers. Thanks!

Edit: No question, good point! Do you have any tips on how to gain a basic knowledge of working with binary files? Python-specific would be helpful but other approaches could also be useful.

TOM:Edited as question, added intro / better title

Answer

tom10 picture tom10 · Jun 9, 2009

You should probably start with the struct module, as you pointed to in your question, and of course, open the file as a binary.

Basically you just start at the beginning of the file and pick it apart piece by piece. It's a hassle, but not a huge problem. If the files are compressed or encrypted, things can get more difficult. It's helpful if you start with a file that you know the contents of so you're not guessing all the time.

Try it a bit, and maybe you'll evolve more specific questions.