Python: Ignore 'Incorrect padding' error when base64 decoding

FunLovinCoder picture FunLovinCoder · May 31, 2010 · Viewed 228.3k times · Source

I have some data that is base64 encoded that I want to convert back to binary even if there is a padding error in it. If I use

base64.decodestring(b64_string)

it raises an 'Incorrect padding' error. Is there another way?

UPDATE: Thanks for all the feedback. To be honest, all the methods mentioned sounded a bit hit and miss so I decided to try openssl. The following command worked a treat:

openssl enc -d -base64 -in b64string -out binary_data

Answer

Simon Sapin picture Simon Sapin · Mar 21, 2012

As said in other responses, there are various ways in which base64 data could be corrupted.

However, as Wikipedia says, removing the padding (the '=' characters at the end of base64 encoded data) is "lossless":

From a theoretical point of view, the padding character is not needed, since the number of missing bytes can be calculated from the number of Base64 digits.

So if this is really the only thing "wrong" with your base64 data, the padding can just be added back. I came up with this to be able to parse "data" URLs in WeasyPrint, some of which were base64 without padding:

import base64
import re

def decode_base64(data, altchars=b'+/'):
    """Decode base64, padding being optional.

    :param data: Base64 data as an ASCII byte string
    :returns: The decoded byte string.

    """
    data = re.sub(rb'[^a-zA-Z0-9%s]+' % altchars, b'', data)  # normalize
    missing_padding = len(data) % 4
    if missing_padding:
        data += b'='* (4 - missing_padding)
    return base64.b64decode(data, altchars)

Tests for this function: weasyprint/tests/test_css.py#L68