對兩個字典進行排序並查找排序列表中的項目索引位置的差異

-1

我面臨一個挑戰。讓我來提出困擾我的挑戰 -對兩個字典進行排序並查找排序列表中的項目索引位置的差異

有兩個詞典 - D1和D2。這些字典大部分時間都有相同的密鑰，但不能保證它總是一樣的。這兩個字典可以表示如下 -

D1 = {[「R1」，0.7]，[「R2」，0.73]，[「R3」，1.5]，[「R4」，2.5] 「R5」，0.12]，[「R6」，1.9]，[「R7」，9.8]，[「R8」，6.5]，[「R9」，7.2] D2 = {[「R1」，0.7]，[「R2」，0.8]，[「R3」，1.5]，[「R4」，3.1]，[「R5」，0.10]，[「R6 「2.0」，[「R7」，8.0]，[「R8」，1.0]，[「R9」，0.0]，[「R10」，5.6]，[「R11」，6.23]

在這些字典中，鍵是字符串數據類型，值是浮點數據類型。物理上它們是兩個不同時間的系統快照。 D1比D2早。我需要根據升序中的值獨立排序這些字典。當完成時，這些字典改變爲：這些字典改爲：這些字典改爲：[0124] 「1.9」，[「R4」，2.5]，[「R10」，5.6]，[「R8」，6.5]，[「R9」，7.2]，[「R7」，9.8] D2 = {[「R9」，0.0]，[「R5」，0.10]，[「R1」，0.7]，[「R2」，0.8]，[「R8」，1.0] ，1.5「，[」R6「，2.0]，[」R4「，3.1]，[」R10「，5.6]，[」R11「，6.23]，[」R7「，8.0]

這裏將字典D1中元素的排序作爲參考點。 D1中的每個元素都與D1中緊接的下一個元素相連。期望在分類之後識別D2中已經破壞序列的元素，因爲它在參考字典D1中出現。雖然確定D2的這些元素（即D1中存在但不存在於D1中但不存在於D2中的密鑰）和D1中元素（即D1中存在密鑰但不存在於D2中）的忽略被忽略。即他們不應該在結果中突出顯示。

例如，在上面列出的示例繼續，從而中斷在D2的順序，參照D1的元素（忽略加入和除去）是 -

斷路器= {[「R9」，0.0]， [「R8」，1.0]}，因爲R9已將序列從D1排序字典中的第8個索引跳轉到D2排序字典中的第0個索引。同樣，R8已將序列從D1排序字典中的第7個索引跳轉到D2排序字典中的第4個索引（所有索引都從0開始）。

注 - [「R11」，6.23]預計不在斷路器列表中，因爲它是D2的附加。

請建議一個算法來達到這個最佳效果，因爲這個操作需要對從數據庫中獲取的數據進行處理，包含3,256,190條記錄。

編程語言不是一個擔心，如果用邏輯指導，我可以承擔以任何語言實現它的任務。

來源

2012-04-06 Siva Senthil

是它允許修剪D2爲D1的大小（因爲D2在這種情況下更大）在排序之前？ – 2012-04-06 07:04:38

不允許。你必須像現在一樣使用D2。 – 2012-04-06 07:05:50

在排序之前，D1中第0個元素的鍵名「R1」是否總是與D2中第0個元素的鍵名匹配？ – 2012-04-06 07:08:54

我想出了這個算法在C＃中。它適用於您的示例數據。我還做了3000000個完全隨機值的測試（因此檢測到很多斷路器），並在我的筆記本電腦（Intel Core i3 2.1GHz，64位）上以3.2秒完成測試。

我首先將您的數據放入臨時詞典中，以便在將它們放入列表之前，我可以複製粘貼您的值。當然你的應用程序會直接把它們放在列表中。

class Program 
{ 
    struct SingleValue 
    { 
     public string Key; 
     public float Value; 
     public SingleValue(string key, float value) 
     { 
      Key = key; 
      Value = value; 
     } 
     public override string ToString() 
     { 
      return string.Format("{0}={1}", Key, Value); 
     } 
    } 

    static void Main(string[] args) 
    { 
     List<SingleValue> D1 = new List<SingleValue>(); 
     HashSet<string> D1keys = new HashSet<string>(); 
     List<SingleValue> D2 = new List<SingleValue>(); 
#if !LARGETEST 
     Dictionary<string, double> D1input = new Dictionary<string, double>() { { "R1", 0.7 }, { "R2", 0.73 }, { "R3", 1.5 }, { "R4", 2.5 }, { "R5", 0.12 }, { "R6", 1.9 }, { "R7", 9.8 }, { "R8", 6.5 }, { "R9", 7.2 }, { "R10", 5.6 } }; 
     Dictionary<string, double> D2input = new Dictionary<string, double>() { { "R1", 0.7 }, { "R2", 0.8 }, { "R3", 1.5 }, { "R4", 3.1 }, { "R5", 0.10 }, { "R6", 2.0 }, { "R7", 8.0 }, { "R8", 1.0 }, { "R9", 0.0 }, { "R10", 5.6 }, { "R11", 6.23 } }; 

     // You should directly put you values into this list... I converted them from a Dictionary so I didn't have to type over your input values :) 
     foreach (KeyValuePair<string, double> kvp in D1input) 
     { 
      D1.Add(new SingleValue(kvp.Key, (float)kvp.Value)); 
      D1keys.Add(kvp.Key); 
     } 
     foreach (KeyValuePair<string, double> kvp in D2input) 
      D2.Add(new SingleValue(kvp.Key, (float)kvp.Value)); 
#else 
     Random ran = new Random(); 
     for (int i = 0; i < 3000000; i++) 
     { 
      D1.Add(new SingleValue(i.ToString(), (float)ran.NextDouble())); 
      D1keys.Add(i.ToString()); 
      D2.Add(new SingleValue(i.ToString(), (float)ran.NextDouble())); 
     } 
#endif 

     // Sort the lists 
     D1.Sort(delegate(SingleValue x, SingleValue y) 
     { 
      if (y.Value > x.Value) 
       return -1; 
      else if (y.Value < x.Value) 
       return 1; 
      return 0; 
     }); 
     D2.Sort(delegate(SingleValue x, SingleValue y) 
     { 
      if (y.Value > x.Value) 
       return -1; 
      else if (y.Value < x.Value) 
       return 1; 
      return 0; 
     }); 

     int start = Environment.TickCount; 

     Dictionary<string, float> breakers = new Dictionary<string, float>(); 
     List<SingleValue> additions = new List<SingleValue>(); 

     // Walk through D1 
     IEnumerator<SingleValue> i1 = D1.GetEnumerator(); 
     IEnumerator<SingleValue> i2 = D2.GetEnumerator(); 

     while (i1.MoveNext() && i2.MoveNext()) 
     { 
      while (breakers.ContainsKey(i1.Current.Key)) 
      { 
       if (!i1.MoveNext()) 
        break; 
      } 

      while (i1.Current.Key != i2.Current.Key) 
      { 
       if (D1keys.Contains(i2.Current.Key)) 
        breakers.Add(i2.Current.Key, i2.Current.Value); 
       else 
        additions.Add(i2.Current); 
       if (!i2.MoveNext()) 
        break; 
      } 
     } 

     int duration = Environment.TickCount - start; 
     Console.WriteLine("Lookup took {0}ms", duration); 
     Console.ReadKey(); 
    } 
}

enter image description here

來源

2012-04-06 09:31:39

在您的代碼中，while循環將i1和i2作爲迭代條件。這意味着D1和D2必須具有相同的尺寸，這是不確定的。它可能是不同的，你分享的是O（n^4）是我們可以得到的最優的？請建議.. – 2012-04-06 09:52:20

他們不必是相同的大小。如果最短列表已結束，while循環就會停止 – 2012-04-06 09:55:16

現在，如果沒有其他方法，我會將其標記爲答案..但我只想從其他人那裏聽取其他更好的方法，並使用最少的循環字典... – 2012-04-06 10:01:33

如果你可以在排序之前從D2中刪除東西很容易，對吧？你說你不能刪除數據。但是，您可以創建一個模擬此類刪除的額外數據結構（例如，將「已刪除」位添加到該項目，或者如果您無法更改其類型，則會創建一組「已刪除」項目）。然後運行簡單的算法，但確保忽略「已刪除」的項目。

來源

2012-04-06 08:39:36

我不明白你在說什麼，請問你能說得更詳細些嗎？ – 2012-04-06 09:41:36

我一直在思考這個問題。正如你所提到的Levenshtein距離，我假設你想通過將他們從D2中的位置移動到D2中的某個其他位置來獲得這些元素，您將以最少的移動次數從D2中獲得D1（忽略不存在於兩個序列）。

我寫了一個貪婪算法，它可能足以滿足您的需求，但它可能不一定會在所有情況下都給出最佳結果。我真的不確定，可能會在稍後（最早的週末）回來查看正確性。然而，如果你真的需要在300萬個元素的序列上做這件事，我相信沒有任何一種算法在這方面做得很好，因爲我看不到一個O（n）算法，好工作，即使在一些微不足道的輸入上也不會失敗。

該算法嘗試將每個元素移動到其預期位置，並計算移動後誤差的總和（每個元素距原始位置的距離）。導致錯誤總和最小的元素被宣佈爲斷路器並移動。重複此操作直到序列返回到D1。我認爲它具有O（n^3）的複雜性，儘管元素有時需要被移動多次，所以它可能是O（n^4）最壞的情況，我不確定，但是在100萬個隨機數有50個元素的例子中，外環運行的最大數量是51（n^4意味着它可以是2500，並且在我所有的百萬測試中我都很幸運）。只有鑰匙，沒有價值。這是因爲這個值在這一步是無關緊要的，所以沒有必要存儲它們。

編輯：我爲此寫了一個反例生成器，事實上它並不總是最優的。破壞者越多，獲得最佳解決方案的可能性就越小。例如，在1000個元素與50次隨機移動的人它通常會發現一組55-60斷路器，當最佳的解決方案是至多50

using System; 
using System.Collections.Generic; 
using System.Linq; 
using System.Text; 

namespace Breakers 
{ 
    class Program 
    { 
     static void Main(string[] args) 
     { 
      //test case 1 
      //List<string> L1 = new List<string> { "R5", "R1", "R2", "R3", "R6", "R4", "R10", "R8", "R9", "R7" }; 
      //List<string> L2 = new List<string> { "R9", "R5", "R1", "R2", "R8", "R3", "R6", "R4", "R10", "R11", "R7" }; 
      //GetBreakers<string>(L1, L2); 

      //test case 2 
      //List<string> L1 = new List<string> { "R5", "R1", "R2", "R3", "R6", "R4", "R10", "R8", "R9", "R7" }; 
      //List<string> L2 = new List<string> { "R5", "R9", "R1", "R6", "R2", "R3", "R4", "R10", "R8", "R7" }; 
      //GetBreakers<string>(L1, L2); 

      //test case 3 
      List<int> L1 = new List<int>(); 
      List<int> L2 = new List<int>(); 
      Random r = new Random(); 
      int n = 100; 
      for (int i = 0; i < n; i++) 
      { 
       L1.Add(i); 
       L2.Add(i); 
      } 
      for (int i = 0; i < 5; i++) // number of random moves, this is the upper bound of the optimal solution 
      { 
       int a = r.Next() % n; 
       int b = r.Next() % n; 
       if (a == b) 
       { 
        i--; 
        continue; 
       } 
       int x = L2[a]; 
       Console.WriteLine(x); 
       L2.RemoveAt(a); 
       L2.Insert(b, x); 
      } 
      for (int i = 0; i < L2.Count; i++) Console.Write(L2[i]); 
      Console.WriteLine(); 
      GetBreakers<int>(L1, L2); 
     } 

     static void GetBreakers<T>(List<T> L1, List<T> L2) 
     { 
      Dictionary<T, int> Appearances = new Dictionary<T, int>(); 
      for (int i = 0; i < L1.Count; i++) Appearances[L1[i]] = 1; 
      for (int i = 0; i < L2.Count; i++) if (Appearances.ContainsKey(L2[i])) Appearances[L2[i]] = 2; 
      for (int i = L1.Count - 1; i >= 0; i--) if (!(Appearances.ContainsKey(L1[i]) && Appearances[L1[i]] == 2)) L1.RemoveAt(i); 
      for (int i = L2.Count - 1; i >= 0; i--) if (!(Appearances.ContainsKey(L2[i]) && Appearances[L2[i]] == 2)) L2.RemoveAt(i); 
      Dictionary<T, int> IndInL1 = new Dictionary<T, int>(); 
      for (int i = 0; i < L1.Count; i++) IndInL1[L1[i]] = i; 

      Dictionary<T, int> Breakers = new Dictionary<T, int>(); 

      int steps = 0; 
      int me = 0; 
      while (true) 
      { 
       steps++; 
       int minError = int.MaxValue; 
       int minErrorIndex = -1; 

       for (int from = 0; from < L2.Count; from++) 
       { 
        T x = L2[from]; 
        int to = IndInL1[x]; 
        if (from == to) continue; 

        L2.RemoveAt(from); 
        L2.Insert(to, x); 

        int error = 0; 
        for (int i = 0; i < L2.Count; i++) 
         error += Math.Abs((i - IndInL1[L2[i]])); 

        L2.RemoveAt(to); 
        L2.Insert(from, x); 

        if (error < minError) 
        { 
         minError = error; 
         minErrorIndex = from; 
        } 
       } 

       if (minErrorIndex == -1) break; 

       T breaker = L2[minErrorIndex]; 
       int breakerOriginalPosition = IndInL1[breaker]; 

       L2.RemoveAt(minErrorIndex); 
       L2.Insert(breakerOriginalPosition, breaker); 

       Breakers[breaker] = 1; 

       me = minError; 
      } 
      Console.WriteLine("Breakers: " + Breakers.Count + " Steps: " + steps); 
      foreach (KeyValuePair<T, int> p in Breakers) 
       Console.WriteLine(p.Key); 
      Console.ReadLine(); 
     } 
    } 
}

來源

2012-04-10 11:47:34 svinja

對兩個字典進行排序並查找排序列表中的項目索引位置的差異

回答

相關問題