2010-09-15 148 views
12

我有一個很大的nvarchar,我希望傳遞給HashBytes函數。 我得到的錯誤:SQL Server 2008和HashBytes

"String or binary would be truncated. Cannot insert the value NULL into column 'colname', tbale 'table'; column does not allow nulls. UPDATE fails. The statement has been terminated."

曾經作爲足智多謀,我發現這是由於具有8000個字節的最大限制HASHBYTES功能。進一步搜索給我一個「解決方案」在我的大VARCHAR將在晚些時候分割seperately散列,然後與該用戶定義的函數:

function [dbo].[udfLargeHashTable] (@algorithm nvarchar(4), @InputDataString varchar(MAX)) 
RETURNS varbinary(MAX) 
AS 
BEGIN 
DECLARE 
    @Index int, 
    @InputDataLength int, 
    @ReturnSum varbinary(max), 
    @InputData varbinary(max) 

SET @ReturnSum = 0 
SET @Index = 1 
SET @InputData = convert(binary,@InputDataString) 
SET @InputDataLength = DATALENGTH(@InputData) 

WHILE @Index <= @InputDataLength 
BEGIN 
    SET @ReturnSum = @ReturnSum + HASHBYTES(@algorithm, SUBSTRING(@InputData, @Index, 8000)) 
    SET @Index = @Index + 8000 
END 
RETURN @ReturnSum 
END 

我與撥打:

set @ReportDefinitionHash=convert(int,dbo.[udfLargeHashTable]('SHA1',@ReportDefinitionForLookup)) 

凡@ReportDefinitionHash是int,而@ReportDefinitionForLookup是varchar

傳遞一個簡單的字符'test'會產生一個與我的UDF不同的int,而不是對HashBytes產生的正常調用。

對此問題有何建議?

+0

基本上,你不想聚合你的哈希字符串,所以返回類型應該是varbinary(20)。然後,嘗試運行以下命令:'select hashbytes('sha1','test'),hashbytes('sha1',N'test')'(您非常驚喜):) – 2010-09-15 14:52:04

回答

9

只要使用此功能(從Hashing large data strings with a User Defined Function拍攝):

create function dbo.fn_hashbytesMAX 
    (@string nvarchar(max) 
    , @Algo varchar(10) 
    ) 
    returns varbinary(20) 
as 
/************************************************************ 
* 
* Author:  Brandon Galderisi 
* Last modified: 15-SEP-2009 (by Denis) 
* Purpose:  uses the system function hashbytes as well 
*     as sys.fn_varbintohexstr to split an 
*     nvarchar(max) string and hash in 8000 byte 
*     chunks hashing each 8000 byte chunk,, 
*     getting the 40 byte output, streaming each 
*     40 byte output into a string then hashing 
*     that string. 
* 
*************************************************************/ 
begin 
    declare @concat  nvarchar(max) 
       ,@NumHash  int 
       ,@HASH   varbinary(20) 
    set @NumHash = ceiling((datalength(@string)/2)/(4000.0)) 
    /* HashBytes only supports 8000 bytes so split the string if it is larger */ 
    if @NumHash>1 
    begin 
                 -- # * 4000 character strings 
      ;with a as (select 1 as n union all select 1) -- 2 
       ,b as (select 1 as n from a ,a a1)  -- 4 
       ,c as (select 1 as n from b ,b b1)  -- 16 
       ,d as (select 1 as n from c ,c c1)  -- 256 
       ,e as (select 1 as n from d ,d d1)  -- 65,536 
       ,f as (select 1 as n from e ,e e1)  -- 4,294,967,296 = 17+ TRILLION characters 
       ,factored as (select row_number() over (order by n) rn from f) 
       ,factors as (select rn,(rn*4000)+1 factor from factored) 

      select @concat = cast((
      select right(sys.fn_varbintohexstr 
         (
         hashbytes(@Algo, substring(@string, factor - 4000, 4000)) 
         ) 
         , 40) + '' 
      from Factors 
      where rn <= @NumHash 
      for xml path('') 
     ) as nvarchar(max)) 


      set @HASH = dbo.fn_hashbytesMAX(@concat ,@Algo) 
    end 
    else 
    begin 
      set @HASH = convert(varbinary(20), hashbytes(@Algo, @string)) 
    end 

return @HASH 
end 

而且結果如下:

select 
hashbytes('sha1', N'test') --native function with nvarchar input 
,hashbytes('sha1', 'test') --native function with varchar input 
,dbo.fn_hashbytesMAX('test', 'sha1') --Galderisi's function which casts to nvarchar input 
,dbo.fnGetHash('sha1', 'test') --your function 

輸出:

0x87F8ED9157125FFC4DA9E06A7B8011AD80A53FE1 
0xA94A8FE5CCB19BA61C4C0873D391E987982FBBD3 
0x87F8ED9157125FFC4DA9E06A7B8011AD80A53FE1 
0x00000000AE6DBA4E0F767D06A97038B0C24ED720662ED9F1 
+0

我覺得有這裏有一個bug。調用具有較大值的'dbo.fn_hashbytesMAX()'會產生相同的散列值。在我看來,'@ string'參數類型需要是'nvarchar(max)'而不是'varchar(max)',否則將'datalength()'結果減半是沒有意義的。實際上,'datalength(@string)/ 2'意味着它只散列一半的子串。 – Rory 2013-09-11 12:01:29

+0

我最初看到提供的函數是用於'nvarchar(max)'輸入的並且被改變了。任何使用它的人都應該將'@ string'數據類型更改爲'nvarchar(max)'或更改代碼以正常工作(這可能意味着將其他nvarchar更改爲varchar並刪除'/ 2',但是您想要測試) – Rory 2013-09-11 12:58:57

+0

我按照以前的評論編輯了答案 - 現在使用nvarchar進行計算。如果傳遞一個varchar值,因爲參數首先被轉換爲nvarchar,將不會輸出與hashbytes()相同的值。更改爲返回varbinary,所以使用md5算法調用返回正確的長度。 – Rory 2013-09-21 13:51:40

1

你可以寫一個SQL CLR功能:

[Microsoft.SqlServer.Server.SqlFunction] 
public static SqlBinary BigHashBytes(SqlString algorithm, SqlString data) 
{ 
    var algo = HashAlgorithm.Create(algorithm.Value); 

    var bytes = Encoding.UTF8.GetBytes(data.Value); 

    return new SqlBinary(algo.ComputeHash(bytes)); 
} 

然後它可以在SQL這樣調用:

--these return the same value 
select HASHBYTES('md5', 'test stuff') 
select dbo.BigHashBytes('md5', 'test stuff') 

BigHashBytes是唯一必要的,如果長度將超過8K。

+0

1)SQL Server中的字符串數據以UTF-16 Little Endian存儲,相當於.NET中的「U​​nicode」。 2)由於SqlString可以通過[SqlString.GetUnicodeBytes](https://msdn.microsoft.com/en-us/library)爲您提供Unicode字節[],因此您不必煩惱'Encoding。 /system.data.sqltypes.sqlstring.getunicodebytes.aspx)。 – 2015-05-29 17:48:19

14

如果您不能創建一個功能,必須使用已經存在於數據庫的東西:

sys.fn_repl_hash_binary(cast('some really long string' as varbinary(max))) 

來自

sys.fn_repl_hash_binary 

可以由使用語法工作: http://www.sqlnotes.info/2012/01/16/generate-md5-value-from-big-data/

+0

注意:僅適用於SQL Server 2008以上版本 – Rory 2013-09-11 12:07:21

+0

如果您有utf-8數據,則不起作用 - 「NVARCHAR」字符串 – gotqn 2014-04-23 08:37:45

+1

SQL Server不使用utf-8字符串。我對NVARCHAR字符串沒有問題。 – 2014-10-06 18:48:58

0

這可被用作功能體,也:

DECLARE @A NVARCHAR(MAX) = N'test' 

DECLARE @res VARBINARY(MAX) = 0x 
DECLARE @position INT = 1 
     ,@len INT = DATALENGTH(@A) 

WHILE 1 = 1 
BEGIN 
    SET @res = @res + HASHBYTES('SHA2_256', SUBSTRING(@A, @position, 4000)) 
    SET @position = @position+4000 
    IF @Position > @len 
     BREAK 
END 

SELECT HASHBYTES('SHA2_256',@res) 

思想si到HASH每個4000部分NVARCHAR(MAX)字符串和concatanate結果。然後到HASH後一個結果。

1

測試工作 選擇master.sys.fn_repl_hash_binary(someVarbinaryMaxValue) 而且並不複雜:)

0

看來最簡單的方法是編寫解析輸入文本值到子varchar(8000)段遞歸哈希算法。 我任意選擇輸入字符串切成7500個字符段 散列算法返回varbinary(20),其可容易地轉化成varchar(20)

ALTER FUNCTION [dbo].[BigHash] 
( 
    @TextValue nvarchar(max) 
) 

RETURNS varbinary(20) 

AS 
BEGIN 

    if @TextValue = null 
     return hashbytes('SHA1', 'null') 


    Declare @FirstPart as varchar(7500) 
    Declare @Remainder as varchar(max) 

    Declare @RemainderHash as varbinary(20) 
    Declare @BinaryValue as varbinary(20) 

    Declare @TextLength as integer 


    Set @TextLength = len(@TextValue) 

    if @TextLength > 7500 
     Begin 
      Set @FirstPart = substring(@TextValue, 1, 7500)   

      Set @Remainder = substring(@TextValue, 7501, @TextLength - 7500)   

      Set @RemainderHash = dbo.BigHash(@Remainder) 

      Set @BinaryValue = hashbytes('SHA1', @FirstPart + convert(varchar(20), @RemainderHash, 2)) 

      return @BinaryValue 

     End 
    else 
     Begin 
      Set @FirstPart = substring(@TextValue, 1, @TextLength)      
      Set @BinaryValue = hashbytes('SHA1', @FirstPart) 

      return @BinaryValue 
     End 


    return null 

END 
6

我已經採取接受的答案,並與修改後的有點以下改進:

  1. 不再遞歸函數
  2. 現在綁定到架構
  3. 不再依靠無證ST已編程的程序
  4. 兩個版本:一個用於nvarchar,一個用於varchar
  5. 返回與HASHBYTES相同的數據大小,由最終用戶根據所用算法將其轉換爲較小值。這使得這些功能可以支持未來的算法和更大的數據返回。

隨着這些變化,的功能,現在可以在持久性計算列作爲創建時它們現在標記確定性被使用。

CREATE FUNCTION dbo.fnHashBytesNVARCHARMAX 
(
    @Algorithm VARCHAR(10), 
    @Text NVARCHAR(MAX) 
) 
RETURNS VARBINARY(8000) 
WITH SCHEMABINDING 
AS 
BEGIN 
    DECLARE @NumHash INT; 
    DECLARE @HASH VARBINARY(8000); 
    SET @NumHash = CEILING(DATALENGTH(@Text)/(8000.0)); 
    /* HashBytes only supports 8000 bytes so split the string if it is larger */ 
    WHILE @NumHash > 1 
    BEGIN 
     -- # * 4000 character strings 
     WITH a AS 
     (SELECT 1 AS n UNION ALL SELECT 1), -- 2 
     b AS 
     (SELECT 1 AS n FROM a, a a1),  -- 4 
     c AS 
     (SELECT 1 AS n FROM b, b b1),  -- 16 
     d AS 
     (SELECT 1 AS n FROM c, c c1),  -- 256 
     e AS 
     (SELECT 1 AS n FROM d, d d1),  -- 65,536 
     f AS 
     (SELECT 1 AS n FROM e, e e1),  -- 4,294,967,296 = 17+ TRILLION characters 
     factored AS 
     (SELECT ROW_NUMBER() OVER (ORDER BY n) rn FROM f), 
     factors AS 
     (SELECT rn, (rn * 4000) + 1 factor FROM factored) 
     SELECT @Text = CAST 
      (
       (
        SELECT CONVERT(VARCHAR(MAX), HASHBYTES(@Algorithm, SUBSTRING(@Text, factor - 4000, 4000)), 1) 
        FROM factors 
        WHERE rn <= @NumHash 
        FOR XML PATH('') 
       ) AS NVARCHAR(MAX) 
      ); 

     SET @NumHash = CEILING(DATALENGTH(@Text)/(8000.0)); 
    END; 
    SET @HASH = CONVERT(VARBINARY(8000), HASHBYTES(@Algorithm, @Text)); 
    RETURN @HASH; 
END; 

CREATE FUNCTION dbo.fnHashBytesVARCHARMAX 
(
    @Algorithm VARCHAR(10), 
    @Text VARCHAR(MAX) 
) 
RETURNS VARBINARY(8000) 
WITH SCHEMABINDING 
AS 
BEGIN 
    DECLARE @NumHash INT; 
    DECLARE @HASH VARBINARY(8000); 
    SET @NumHash = CEILING(DATALENGTH(@Text)/(8000.0)); 
    /* HashBytes only supports 8000 bytes so split the string if it is larger */ 
    WHILE @NumHash > 1 
    BEGIN 
     -- # * 4000 character strings 
     WITH a AS 
     (SELECT 1 AS n UNION ALL SELECT 1), -- 2 
     b AS 
     (SELECT 1 AS n FROM a, a a1),  -- 4 
     c AS 
     (SELECT 1 AS n FROM b, b b1),  -- 16 
     d AS 
     (SELECT 1 AS n FROM c, c c1),  -- 256 
     e AS 
     (SELECT 1 AS n FROM d, d d1),  -- 65,536 
     f AS 
     (SELECT 1 AS n FROM e, e e1),  -- 4,294,967,296 = 17+ TRILLION characters 
     factored AS 
     (SELECT ROW_NUMBER() OVER (ORDER BY n) rn FROM f), 
     factors AS 
     (SELECT rn, (rn * 8000) + 1 factor FROM factored) 
     SELECT @Text = CAST 
     (
      (
       SELECT CONVERT(VARCHAR(MAX), HASHBYTES(@Algorithm, SUBSTRING(@Text, factor - 8000, 8000)), 1) 
       FROM factors 
       WHERE rn <= @NumHash 
       FOR XML PATH('') 
      ) AS NVARCHAR(MAX) 
     ); 

     SET @NumHash = CEILING(DATALENGTH(@Text)/(8000.0)); 
    END; 
    SET @HASH = CONVERT(VARBINARY(8000), HASHBYTES(@Algorithm, @Text)); 
    RETURN @HASH; 
END;