Convert binary to ASCII and vice versa

sbrichards picture sbrichards · Sep 13, 2011 · Viewed 249.8k times · Source

Using this code to take a string and convert it to binary:

bin(reduce(lambda x, y: 256*x+y, (ord(c) for c in 'hello'), 0))

this outputs:

0b110100001100101011011000110110001101111

Which, if I put it into this site (on the right hand site) I get my message of hello back. I'm wondering what method it uses. I know I could splice apart the string of binary into 8's and then match it to the corresponding value to bin(ord(character)) or some other way. Really looking for something simpler.

Answer

jfs picture jfs · Sep 13, 2011

For ASCII characters in the range [ -~] on Python 2:

>>> import binascii
>>> bin(int(binascii.hexlify('hello'), 16))
'0b110100001100101011011000110110001101111'

In reverse:

>>> n = int('0b110100001100101011011000110110001101111', 2)
>>> binascii.unhexlify('%x' % n)
'hello'

In Python 3.2+:

>>> bin(int.from_bytes('hello'.encode(), 'big'))
'0b110100001100101011011000110110001101111'

In reverse:

>>> n = int('0b110100001100101011011000110110001101111', 2)
>>> n.to_bytes((n.bit_length() + 7) // 8, 'big').decode()
'hello'

To support all Unicode characters in Python 3:

def text_to_bits(text, encoding='utf-8', errors='surrogatepass'):
    bits = bin(int.from_bytes(text.encode(encoding, errors), 'big'))[2:]
    return bits.zfill(8 * ((len(bits) + 7) // 8))

def text_from_bits(bits, encoding='utf-8', errors='surrogatepass'):
    n = int(bits, 2)
    return n.to_bytes((n.bit_length() + 7) // 8, 'big').decode(encoding, errors) or '\0'

Here's single-source Python 2/3 compatible version:

import binascii

def text_to_bits(text, encoding='utf-8', errors='surrogatepass'):
    bits = bin(int(binascii.hexlify(text.encode(encoding, errors)), 16))[2:]
    return bits.zfill(8 * ((len(bits) + 7) // 8))

def text_from_bits(bits, encoding='utf-8', errors='surrogatepass'):
    n = int(bits, 2)
    return int2bytes(n).decode(encoding, errors)

def int2bytes(i):
    hex_string = '%x' % i
    n = len(hex_string)
    return binascii.unhexlify(hex_string.zfill(n + (n & 1)))

Example

>>> text_to_bits('hello')
'0110100001100101011011000110110001101111'
>>> text_from_bits('110100001100101011011000110110001101111') == u'hello'
True