How To Make a "Corrupt" File

user807566 picture user807566 · May 2, 2012 · Viewed 21.6k times · Source

Suppose, during testing, you wish to test how the software handles a "corrupt" file.

I have two questions:

1. In general, how do you define a "corrupt" file? In other words, what constitutes a corrupt file?

As an example:

Suppose you need to test a "corrupt" .pdf file.

One suggestion is to simply take a .zip file, change the extension, and test with that. However, I would argue that you are not testing how the program handles a "corrupt .pdf file," but rather, how it handles a .zip file.

Another suggestion is to open the file and insert/delete random bytes. This suggestion is okay, but there are a few problems:

  • It is possible (albeit unlikely) that the sections which are modified or removed are inconsequential. For example, you may simply delete a section of a huge string, which would modify the data, but not necessarily corrupt the file.
  • It is possible that the file can be modified in such a way that the program will refuse to read the file. For example, if the .pdf header is deleted, then maybe the API (or whatever you are using) won't get past that point and the file cannot be tested at all.
  • Similar to the first bullet: If the file is modified dramatically enough, then there is an argument that the resulting file is no longer the same format as the original. So, again, if you were to delete the .pdf header, then maybe that file is no longer a .pdf file. So attempting to test it does not test a corrupt .pdf file, but instead tests some odd variation of a .pdf file.

2. Once a corrupt file is defined, how do you go about creating one?


Here is what I have been thinking so far:

A "corrupt file" is a file that correctly meets the specifications of the file format, but which contains data/bytes that are inherently flawed.

The only example I could think of was if you changed the encoding of the file somehow. You could then possibly apply this method to files of arbitrary format.

Thanks for reading.

Answer

Bonny Bonev picture Bonny Bonev · May 2, 2012

The file format is defined by two things. 1. File's extension - that should tell the OS what format is the file. 2. The MimeType of the document. Many documents has wrong extensions (.avi .jpg), but actually the mime type tells what ther are about.

How to corrupt a document? You cant just add random bytes or smthing (.txt file for example will not be corrupted)

there are 2 things you need to do actually.

First - you change the mime type of the file, then you can possibly add some random bytes. If the myme type is different (not similar) to the extension - for example "text/html" for .avi file the file can't be recognized by the program for such mime type.

However, for different test scenarious there may be nessesary to create different "versions" of corrupted files.

Hope it helps.