How do uppercase and lowercase letters differ by only one bit?

Vibhakar SInha picture Vibhakar SInha · Aug 25, 2010 · Viewed 67.1k times · Source

I have found one example in Data and Communication Networking book written by Behrouza Forouzan regarding upper- and lowercase letters which differ by only one bit in the 7 bit code.

For example, character A is 1000001 (0x41) and character a is 1100001 (0x61).The difference is in bit 6, which is 0 in uppercase letters and 1 in lowercase letters. If we know the code for one case, we can easily find the code for the other by adding or subtracting 32 in decimal, or we can just flip the sixth bit.

What does all this mean?

I have found myself very confused with all these things. Could someone provide examples of how these things really work out?

Answer

ErikE picture ErikE · Aug 25, 2010

Let's use a case that you'll find more familiar: base 10.

  1. Suppose we have a base 10 computer, where each 10bit stores a value from 0 to 9, and a 10byte is 5 10bits long, so that each byte can store 100,000 values (0 through 99,999).

  2. You wish to assign letters to particular positions in a 10byte so that this computer can communicate text data with other computers. One way you could do this would be like so:

     00101 A    00201 a
     00102 B    00202 b
     00103 C    00203 c
     00104 D    00204 d
     00105 E    00205 e
     00106 F    00206 f
     00107 G    00207 g
     00108 H    00208 h
     00109 I    00209 i
     00110 J    00210 j
     00111 K    00211 k
     00112 L    00212 l
     00113 M    00213 m
     00114 N    00214 n
     00115 O    00215 o
     00116 P    00216 p
     00117 Q    00217 q
     00118 R    00218 r
     00119 S    00219 s
     00120 T    00220 t
     00121 U    00221 u
     00122 V    00222 v
     00123 W    00223 w
     00124 X    00224 x
     00125 Y    00225 y
     00126 Z    00226 z
    
  3. Do you see that each lower case letter differs from the upper case letter by only a single 10bit digit, in the 3rd column from the right? It didn't have to be designed this way. It was simply convenient, because then any time we want to adjust the case of a letter we can simply modify one of the digits (10bits) without caring what the rest of the number is or bothering with twenty-six different transformations when we can do one. We couldn't have chosen the second digit because instead of being 100 apart, they'd be only 10 apart and would overlap.

  4. Now, in base 2 it is exactly the same, but instead of each bit representing 0-9, it can only represent 0-1. Using eight 2-bits gives us only 256 possible combinations, 0-255. The ASCII codes for the upper and lower case letters in binary look like this:

     01000001 A        01100001 a
     01000010 B        01100010 b
     01000011 C        01100011 c
     01000100 D        01100100 d
     01000101 E        01100101 e
     01000110 F        01100110 f
     01000111 G        01100111 g
     01001000 H        01101000 h
     01001001 I        01101001 i
     01001010 J        01101010 j
     01001011 K        01101011 k
     01001100 L        01101100 l
     01001101 M        01101101 m
     01001110 N        01101110 n
     01001111 O        01101111 o
     01010000 P        01110000 p
     01010001 Q        01110001 q
     01010010 R        01110010 r
     01010011 S        01110011 s
     01010100 T        01110100 t
     01010101 U        01110101 u
     01010110 V        01110110 v
     01010111 W        01110111 w
     01011000 X        01111000 x
     01011001 Y        01111001 y
     01011010 Z        01111010 z
    

    Just the same as before, they differ by only one 2bit digit, here in the 6th column from the right. We couldn't have used a digit any farther to the right (smaller) because then the lists would have overlapped (2^5 = 32 and accordingly we used all bits 0 through 5, but 2^4 = 16, which could not cover the 26 letters of the alphabet).

  5. Just to fill things out a little, here's an example of what those binary values mean. Let's take the one for G. To understand what 01000111 means in binary:

      Pos:   7  6  5  4  3  2  1  0
      Bit:   0  1  0  0  0  1  1  1
      Val: 128 64 32 16  8  4  2  1
     Mult:   0 64  0  0  0  4  2  1
      Add: 64 + 4 + 2 + 1 = 71, which is the ASCII code for G.
    

    Doing the same thing for the letter G in the special base 10 system I constructed above:

       Pos:     4    3    2    1    0
     10Bit:     0    0    1    0    7
       Val: 10000 1000  100   10    1
      Mult:     0    0  100    0    7
       Add: 100 + 7 = 107, which is my special 10ASCII code for G.
    

    Look back at the "Val" row for binary. Do you see that starting from the right, each value is double the previous one? Doubling each time we get 1, 2, 4, 8, 16, 32, 64, 128, 256, 512, and so on. This is how a binary digit's position determines its value, just like a decimal digit's position determines its value with powers of 10: 1, 10, 100, 1000, 10000, 100000, and so on.

    I realize this seems silly because all I did was convert 107 to 107... but 107 isn't just a number, it is a shorthand form for:

     1 hundreds + 0 tens + 7 ones.
    

    Another way we could represent that is

     0 x 10^4 + 0 x 10^3 + 1 x 10^2 + 0 x 10^1 + 7 x 10^0.
    

    Similarly, 01000111 isn't just a binary number, it is a shorthand form for

     0 x 2^7 + 1 x 2^6 + 0 x 2^5 + 0 x 2^4 + 0 x 2^3 + 1 x 2^2 + 1 x 2^1 + 1 x 2^0
    

    Which is what I already showed you:

     0 + 64 + 0 + 0 + 0 + 4 + 2 + 1
     = 64 + 4 + 2 + 1
     = 71
    

Also, you may have been wondering what 0x41 and 0x61 meant. The 0x part indicates that the digits to follow are to be understood as hexadecimal, which is base 16. There are only 10 digits in our number system, so we need 6 more digits somehow. Thus, hexadecimal uses the digits 0-9 and treats the letters A-F as the remaining digits, where A is 10 up through F as 15. Hexadecimal is very convenient for computers because 16 is a power of 2, and an 8-bit byte thus takes exactly two hex digits to encode (and each hex digit encodes exactly four binary digits). Taking 0x41, expanding 4 to its binary representation 0100 and expanding 1 to its binary representation 0001 you get 01000001, which you can see is the code for A as shown. To convert it to decimal it is 4 x 16 + 1 x 1 = 65. We multiply the 4 by 16 because each successive hexadecimal digit leftward is 16 times the previous digit, following the same pattern as I showed you above for base 2 and 10.

I hope this is sufficient for you to understand a little bit more about binary and ASCII codes.

Note 1: The reason for 8 bits in a byte instead of 2 as you might think is that 8 is a better balance, as a 2-bit "byte" would only encode 4 values, and transmitting the upper and lower case letters of the alphabet alone would require 3 bytes! There is nothing inherent in binary that forces the choice of 8 bits per byte, except that 8 is also a power of 2 which makes a lot of the math involved in working with binary information simpler and things align on edges better.

In the early days of computing different systems had many different byte lengths, including 7, 9, or other numbers!) Nowadays the computing world settled on 8 as a standard and useful of bits in a byte (though, note that text sometimes require 2 – 4 bytes to fully represent all the possible characters.

I am sure that choosing something like 6 bits per byte would have worked out awkwardly, and would not have made good use of the full range of values available.

Note 2: My system of five bits in a 10byte is based on the impracticality of using ten 10bits per byte, which yields a really huge number that would waste a lot of storage space. I chose five because ten is evenly divisible by it, which would undoubtedly be useful. (Originally, my answer used ten 10bits per 10byte, but it was too darned big!)