1. <small id='GAvcS'></small><noframes id='GAvcS'>

      <i id='GAvcS'><tr id='GAvcS'><dt id='GAvcS'><q id='GAvcS'><span id='GAvcS'><b id='GAvcS'><form id='GAvcS'><ins id='GAvcS'></ins><ul id='GAvcS'></ul><sub id='GAvcS'></sub></form><legend id='GAvcS'></legend><bdo id='GAvcS'><pre id='GAvcS'><center id='GAvcS'></center></pre></bdo></b><th id='GAvcS'></th></span></q></dt></tr></i><div id='GAvcS'><tfoot id='GAvcS'></tfoot><dl id='GAvcS'><fieldset id='GAvcS'></fieldset></dl></div>
      <legend id='GAvcS'><style id='GAvcS'><dir id='GAvcS'><q id='GAvcS'></q></dir></style></legend>

      <tfoot id='GAvcS'></tfoot>
        • <bdo id='GAvcS'></bdo><ul id='GAvcS'></ul>

        计算 UTF8 字符串的 MD5 哈希值

        时间:2023-10-10
        • <bdo id='KILCN'></bdo><ul id='KILCN'></ul>

            • <legend id='KILCN'><style id='KILCN'><dir id='KILCN'><q id='KILCN'></q></dir></style></legend>
                <i id='KILCN'><tr id='KILCN'><dt id='KILCN'><q id='KILCN'><span id='KILCN'><b id='KILCN'><form id='KILCN'><ins id='KILCN'></ins><ul id='KILCN'></ul><sub id='KILCN'></sub></form><legend id='KILCN'></legend><bdo id='KILCN'><pre id='KILCN'><center id='KILCN'></center></pre></bdo></b><th id='KILCN'></th></span></q></dt></tr></i><div id='KILCN'><tfoot id='KILCN'></tfoot><dl id='KILCN'><fieldset id='KILCN'></fieldset></dl></div>
                <tfoot id='KILCN'></tfoot>
                  <tbody id='KILCN'></tbody>
              1. <small id='KILCN'></small><noframes id='KILCN'>

                  本文介绍了计算 UTF8 字符串的 MD5 哈希值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着跟版网的小编来一起学习吧!

                  问题描述

                  我有一个 SQL 表,我在其中存储必须唯一的大字符串值.为了确保唯一性,我在一个列上有一个唯一索引,我在其中存储了大字符串的 MD5 哈希的字符串表示.

                  I have an SQL table in which I store large string values that must be unique. In order to ensure the uniqueness, I have a unique index on a column in which I store a string representation of the MD5 hash of the large string.

                  保存这些记录的 C# 应用程序使用以下方法进行散列:

                  The C# app that saves these records uses the following method to do the hashing:

                  public static string CreateMd5HashString(byte[] input)
                  {
                      var hashBytes = MD5.Create().ComputeHash(input);
                      return string.Join("", hashBytes.Select(b => b.ToString("X")));
                  }
                  

                  为了调用它,我首先使用UTF-8编码将string转换为byte[]:

                  In order to call this, I first convert the string to byte[] using the UTF-8 encoding:

                  // this is what I use in my app
                  CreateMd5HashString(Encoding.UTF8.GetBytes("abc"))
                  // result: 90150983CD24FB0D6963F7D28E17F72
                  

                  现在我希望能够在 SQL 中实现这个散列函数,使用 HASHBYTES 函数,但我得到不同的值:

                  Now I would like to be able to implement this hashing function in SQL, using the HASHBYTES function, but I get a different value:

                  print hashbytes('md5', N'abc')
                  -- result: 0xCE1473CF80C6B3FDA8E3DFC006ADC315
                  

                  这是因为 SQL 计算字符串的 UTF-16 表示的 MD5.如果我执行 CreateMd5HashString(Encoding.Unicode.GetBytes("abc")),我在 C# 中得到相同的结果.

                  This is because SQL computes the MD5 of the UTF-16 representation of the string. I get the same result in C# if I do CreateMd5HashString(Encoding.Unicode.GetBytes("abc")).

                  我无法更改应用程序中进行散列的方式.

                  I cannot change the way hashing is done in the application.

                  有没有办法让 SQL Server 计算字符串的 UTF-8 字节的 MD5 哈希值?

                  Is there a way to get SQL Server to compute the MD5 hash of the UTF-8 bytes of the string?

                  我查找了类似的问题,我尝试使用排序规则,但到目前为止还没有运气.

                  I looked up similar questions, I tried using collations, but had no luck so far.

                  推荐答案

                  您需要创建一个 UDF 来将 NVARCHAR 数据转换为 UTF-8 表示形式的字节.假设它被称为 dbo.NCharToUTF8Binary 那么你可以这样做:

                  You need to create a UDF to convert the NVARCHAR data to bytes in UTF-8 Representation. Say it is called dbo.NCharToUTF8Binary then you can do:

                  hashbytes('md5', dbo.NCharToUTF8Binary(N'abc', 1))
                  

                  这是一个可以做到这一点的 UDF:

                  Here is a UDF which will do that:

                  create function dbo.NCharToUTF8Binary(@txt NVARCHAR(max), @modified bit)
                  returns varbinary(max)
                  as
                  begin
                  -- Note: This is not the fastest possible routine. 
                  -- If you want a fast routine, use SQLCLR
                      set @modified = isnull(@modified, 0)
                      -- First shred into a table.
                      declare @chars table (
                      ix int identity primary key,
                      codepoint int,
                      utf8 varbinary(6)
                      )
                      declare @ix int
                      set @ix = 0
                      while @ix < datalength(@txt)/2  -- trailing spaces
                      begin
                          set @ix = @ix + 1
                          insert @chars(codepoint)
                          select unicode(substring(@txt, @ix, 1))
                      end
                  
                      -- Now look for surrogate pairs.
                      -- If we find a pair (lead followed by trail) we will pair them
                      -- High surrogate is uD800 to uDBFF
                      -- Low surrogate  is uDC00 to uDFFF
                      -- Look for high surrogate followed by low surrogate and update the codepoint   
                      update c1 set codepoint = ((c1.codepoint & 0x07ff) * 0x0800) + (c2.codepoint & 0x07ff) + 0x10000
                      from @chars c1 inner join @chars c2 on c1.ix = c2.ix -1
                      where c1.codepoint >= 0xD800 and c1.codepoint <=0xDBFF
                      and c2.codepoint >= 0xDC00 and c2.codepoint <=0xDFFF
                      -- Get rid of the trailing half of the pair where found
                      delete c2 
                      from @chars c1 inner join @chars c2 on c1.ix = c2.ix -1
                      where c1.codepoint >= 0x10000
                  
                      -- Now we utf-8 encode each codepoint.
                      -- Lone surrogate halves will still be here
                      -- so they will be encoded as if they were not surrogate pairs.
                      update c 
                      set utf8 = 
                      case 
                      -- One-byte encodings (modified UTF8 outputs zero as a two-byte encoding)
                      when codepoint <= 0x7f and (@modified = 0 OR codepoint <> 0)
                      then cast(substring(cast(codepoint as binary(4)), 4, 1) as varbinary(6))
                      -- Two-byte encodings
                      when codepoint <= 0x07ff
                      then substring(cast((0x00C0 + ((codepoint/0x40) & 0x1f)) as binary(4)),4,1)
                      + substring(cast((0x0080 + (codepoint & 0x3f)) as binary(4)),4,1)
                      -- Three-byte encodings
                      when codepoint <= 0x0ffff
                      then substring(cast((0x00E0 + ((codepoint/0x1000) & 0x0f)) as binary(4)),4,1)
                      + substring(cast((0x0080 + ((codepoint/0x40) & 0x3f)) as binary(4)),4,1)
                      + substring(cast((0x0080 + (codepoint & 0x3f)) as binary(4)),4,1)
                      -- Four-byte encodings 
                      when codepoint <= 0x1FFFFF
                      then substring(cast((0x00F0 + ((codepoint/0x00040000) & 0x07)) as binary(4)),4,1)
                      + substring(cast((0x0080 + ((codepoint/0x1000) & 0x3f)) as binary(4)),4,1)
                      + substring(cast((0x0080 + ((codepoint/0x40) & 0x3f)) as binary(4)),4,1)
                      + substring(cast((0x0080 + (codepoint & 0x3f)) as binary(4)),4,1)
                  
                      end
                      from @chars c
                  
                      -- Finally concatenate them all and return.
                      declare @ret varbinary(max)
                      set @ret = cast('' as varbinary(max))
                      select @ret = @ret + utf8 from @chars c order by ix
                      return  @ret
                  
                  end
                  

                  这篇关于计算 UTF8 字符串的 MD5 哈希值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持跟版网!

                  上一篇:在子查询中聚合按位或 下一篇:计算2个日期之间每个日期的记录数

                  相关文章

                  <tfoot id='LufEw'></tfoot>
                  • <bdo id='LufEw'></bdo><ul id='LufEw'></ul>

                  1. <small id='LufEw'></small><noframes id='LufEw'>

                      <i id='LufEw'><tr id='LufEw'><dt id='LufEw'><q id='LufEw'><span id='LufEw'><b id='LufEw'><form id='LufEw'><ins id='LufEw'></ins><ul id='LufEw'></ul><sub id='LufEw'></sub></form><legend id='LufEw'></legend><bdo id='LufEw'><pre id='LufEw'><center id='LufEw'></center></pre></bdo></b><th id='LufEw'></th></span></q></dt></tr></i><div id='LufEw'><tfoot id='LufEw'></tfoot><dl id='LufEw'><fieldset id='LufEw'></fieldset></dl></div>
                    1. <legend id='LufEw'><style id='LufEw'><dir id='LufEw'><q id='LufEw'></q></dir></style></legend>