Convert a very large number from decimal string to binary representation?

frodo picture frodo · Jun 13, 2012 · Viewed 11.4k times · Source

I have a very big number, on the order of a thousand decimal digits, and I have to convert this to its binary representation. The numbers are stored as strings.

Since few languages have a basic data type to handle numbers this big, I see no easy way to turn this into an integral value for which I could convert it.

Could someone please help me out here? What would be a viable approach for doing this?

Answer

paxdiablo picture paxdiablo · Jun 13, 2012

If this is a genuine problem, there are plenty of BigNum libraries out there to assist, such as the MPIR library.

If it's something where you can't use a third-party library, it's still relatively easy. You don't actually need a complex BigNum library for this, you only need one operation: divide by two.

Here's how you do it. Start with an empty stack of binary digits. Then loop until the number is "0" (yes, that's still a string). If the last digit of the number is odd, push 1 on to the stack, otherwise push 0. Then divide the number by two and restart the loop.

Once the loop is finished (number is "0"), pop the digits off the stack one at a time and print them. There you go.


Oh, yeah, the divide-by-two, that is a rather important piece of the puzzle :-)

Let's start with "12345". Here's the process you follow, in pseudo-code.

Set next_additive to 0.
For every digit in number (starting at the left):
    Set additive to next_additive.
    If the digit is odd, set next_additive to 5, else set it to 0.
    Divide the digit by two (truncating) then add additive.
Remove leading zero if necessary (if it starts with 0 but is not just 0).

This can be done by processing the actual string one character at a time.

  • Starting with 1 (from 12345), additive is 0, number is odd, so next_additive is 5. Divide 1 by 2 and add additive of 0, you get 0: 02345.

  • Next digit 2, additive is 5, number is even, so next_additive is 0. Divide 2 by 2 and add additive of 5, you get 6: 06345.

  • Next digit 3, additive is 0, number is odd, so next_additive is 5. Divide 3 by 2 and add additive of 0, you get 1: 06145.

  • Next digit 4, additive is 5, number is even, so next_additive is 0. Divide 4 by 2 and add additive of 5, you get 7: 06175.

  • Next digit 5, additive is 0, number is odd, so next_additive is 5. Divide 5 by 2 and add additive of 0, you get 2: 06172.

  • Strip off leading zeros: 6172. Ignore the next additive since you're truncating the result.

And there you have it: 12345 / 2 = 6172.


By way of example, here's a Python approach to implementing this algorithm as follows. First the support routine for checking if a string-number is odd (keep in mind this isn't meant to be Pythonic code, it's just to show how it could be done - there's almost certainly better ways to do this in Python but that won't necessarily map well to another language):

def oddsToOne(s):
    if s.endswith('1'): return 1
    if s.endswith('3'): return 1
    if s.endswith('5'): return 1
    if s.endswith('7'): return 1
    if s.endswith('9'): return 1
    return 0

Then another support routine for dividing a string-number by two:

def divByTwo(s):
    new_s = ''
    add = 0

    for ch in s:
        new_dgt = (ord(ch) - ord('0')) // 2 + add
        new_s = '%s%d' % (new_s, new_dgt)
        add = oddsToOne(ch) * 5

    if new_s != '0' and new_s.startswith('0'):
        new_s = new_s[1:]

    return new_s

And, finally, some actual code to make a binary string from the decimal string:

num = '12345'

if num == '0':
    stack = '0'
else:
    stack = ''
    while num != '0':
        stack = '%d%s'%(oddsToOne(num), stack)
        num = divByTwo (num)

print(stack)

Note that if you wanted to actually use this to populate real bits (rather than make a string of bits), it's a simple matter to change what happens in the if and else clauses.

As stated, it's probably not the most efficient or beautiful Python code you could come up with but it's simply meant to show the process, not be some well-engineered production-ready piece of code. The output is (with some added stuff below to show what's going on):

12345
11000000111001
||      |||  |
||      |||  +-     1
||      ||+----     8
||      |+-----    16
||      +------    32
|+-------------  4096
+--------------  8192
                =====
                12345

Because this works on the string representation of the numbers, there is no arbitrary numeric limit such as the size of a 64-bit integer. Some example values are (slightly reformatted into 32-digit chunks for readability):

123456781234567812345678
=> 11010001001001001101100000011011
   01110110001110110010110110101111
   0111101001110

99999999999999999999999999999999
99999999999999999999999999999999
99999999999999999999999999999999
9999
=> 10010010010011010110100100101100
   10100110000110111110011101011000
   01011001001111000010011000100110
   01110000010111111001110001010110
   01110010000001000111000100001000
   11010011111001010101010110010010
   00011000010001010100000101110100
   01111000011111111111111111111111
   11111111111111111111111111111111
   11111111111111111111111111111111
   1111111111111