SQL Server 2014 Hashbytes of a nvarchar(max) result is nvarchar(max)

John picture John · Jul 30, 2016 · Viewed 11.5k times · Source

Using SQL Server 2014 I have a table that has a nvarchar(max) column called [ASCII File] which can contain an ASCII text file of many K. I then want to do a MD5 hashbytes on that file and the resultant hash should always be 20 bytes.

Well when I do a select of hashbytes('MD5', [ASCII File]) I get query completed with errors

Msg 8152, Level 16, State 10, Line 4
String or binary data would be truncated.

I get the same message when I try

left(hashbytes('MD5', [ASCII File]), 50)

I get the same message when I try

convert(varchar(50), hashbytes('MD5', [ASCII File]))

It seems like since the column I am doing the hashbytes on is nvarchar(max), the result of the hashbytes function also is nvarchar(max).

Can you tell me how I can get the result to be the expected 20 long and not something so long it has to be truncated?

Answer

Solomon Rutzky picture Solomon Rutzky · Jul 30, 2016

It seems like since the field I am doing the hashbytes on is nvarchar(max) the result of the hashbytes is nvarchar(max).

No, that is not possible, especially since the return value of HASHBYTES is a VARBINARY. Also, since your tests were just SELECT statements and not an INSERT statement, there is no way for the return value to get a truncation error. The truncation error is coming from the input value. As stated in that linked MSDN page for HASHBYTES (for SQL Server 2012 and 2014):

Allowed input values are limited to 8000 bytes. The output conforms to the algorithm standard: 128 bits (16 bytes) for MD2, MD4, and MD5; 160 bits (20 bytes) for SHA and SHA1; 256 bits (32 bytes) for SHA2_256, and 512 bits (64 bytes) for SHA2_512.

That really says it all: the input is limited to 8000 bytes, and the output is a fixed number of bytes, based on the specified algorithm.

The updated documentation, for SQL Server 2016 (which has removed the 8000 byte limitation), states:

For SQL Server 2014 and earlier, allowed input values are limited to 8000 bytes.

You can run a simple test:

DECLARE @Test NVARCHAR(MAX) = REPLICATE(CONVERT(NVARCHAR(MAX), N't'), 50000);
SELECT LEN(@Test);
SELECT HASHBYTES('MD5', @Test);

Returns:

50000

Msg 8152, Level 16, State 10, Line 3
String or binary data would be truncated.

If you want to pass in more than 8000 bytes to a hash function in a version of SQL Server prior to 2016, then you need to use SQLCLR. You can either write your own function, or you can download and install the Free version of the SQL# SQLCLR library (which I created), and use the Util_Hash and Util_HashBinary functions:

DECLARE @Test NVARCHAR(MAX) = REPLICATE(CONVERT(NVARCHAR(MAX), N't'), 50000);
SELECT LEN(@Test);
SELECT SQL#.Util_Hash('MD5', CONVERT(VARBINARY(MAX), @Test));
SELECT SQL#.Util_HashBinary('MD5', CONVERT(VARBINARY(MAX), @Test));

Returns:

50000
40752EB301B41EEAEB309348CE9711D6
0x40752EB301B41EEAEB309348CE9711D6

UPDATE

In the case of using a VARCHAR(MAX) column or variable but with 8000 or fewer characters (or an NVARCHAR(MAX) column or variable with 4000 or fewer characters), there will be no issue and everything will work as expected:

DECLARE @Test VARCHAR(MAX) = REPLICATE('t', 5000);
SELECT LEN(@Test) AS [Characters], 
       HASHBYTES('MD5', @Test) AS [MD5];

Returns:

5000    0x6ABFBA10B49157F2EF8C85862B6E6313