Hashing a string with Sha256

Nattfrosten picture Nattfrosten · Sep 14, 2012 · Viewed 258k times · Source

I try to hash a string using SHA256, I'm using the following code:

using System;
using System.Security.Cryptography;
using System.Text;
 public class Hash
    {
    public static string getHashSha256(string text)
    {
        byte[] bytes = Encoding.Unicode.GetBytes(text);
        SHA256Managed hashstring = new SHA256Managed();
        byte[] hash = hashstring.ComputeHash(bytes);
        string hashString = string.Empty;
        foreach (byte x in hash)
        {
            hashString += String.Format("{0:x2}", x);
        }
        return hashString;
    }
}

However, this code gives significantly different results compared to my friends php, as well as online generators (such as This generator)

Does anyone know what the error is? Different bases?

Answer

Quuxplusone picture Quuxplusone · Sep 14, 2012

Encoding.Unicode is Microsoft's misleading name for UTF-16 (a double-wide encoding, used in the Windows world for historical reasons but not used by anyone else). http://msdn.microsoft.com/en-us/library/system.text.encoding.unicode.aspx

If you inspect your bytes array, you'll see that every second byte is 0x00 (because of the double-wide encoding).

You should be using Encoding.UTF8.GetBytes instead.

But also, you will see different results depending on whether or not you consider the terminating '\0' byte to be part of the data you're hashing. Hashing the two bytes "Hi" will give a different result from hashing the three bytes "Hi". You'll have to decide which you want to do. (Presumably you want to do whichever one your friend's PHP code is doing.)

For ASCII text, Encoding.UTF8 will definitely be suitable. If you're aiming for perfect compatibility with your friend's code, even on non-ASCII inputs, you'd better try a few test cases with non-ASCII characters such as é and and see whether your results still match up. If not, you'll have to figure out what encoding your friend is really using; it might be one of the 8-bit "code pages" that used to be popular before the invention of Unicode. (Again, I think Windows is the main reason that anyone still needs to worry about "code pages".)