Linq優化計數和組

我寫了一個代碼，用於計算二進制文件中的每個字節頻率。使用Linq。執行Linq表達式時，代碼似乎很慢。在這種邏輯上似乎很難實現並行性。要建立頻率超過475MB的頻率表，大約需要1分鐘。Linq優化計數和組

class Program 
{ 
    static void Main(string[] args) 
    { 
     Dictionary<byte, int> freq = new Dictionary<byte, int>(); 
     Stopwatch sw = new Stopwatch(); 


     sw.Start(); 
     //File Size 478.668 KB 
     byte[] ltext = File.ReadAllBytes(@"D:\Setup.exe"); 
     sw.Stop(); 

     Console.WriteLine("Reading File {0}", GetTime(sw)); 




     sw.Start(); 
     Dictionary<byte, int> result = (from i in ltext 
            group i by i into g 
            orderby g.Count() descending 
            select new { Key = g.Key, Freq = g.Count() }) 
            .ToDictionary(x => x.Key, x => x.Freq); 
     sw.Stop(); 
     Console.WriteLine("Generating Freq Table {0}", GetTime(sw)); 


     foreach (var i in result) 
     { 
      Console.WriteLine(i); 
     } 
     Console.WriteLine(result.Count); 
     Console.ReadLine(); 
    } 

    static string GetTime(Stopwatch sw) 
    { 
     TimeSpan ts = sw.Elapsed; 
     string elapsedTime = String.Format("{0} min {1} sec {2} ms",ts.Minutes, ts.Seconds, ts.Milliseconds); 
     return elapsedTime; 
    }

我試過使用幾個循環來實現非linq解決方案，其性能大致相同。請任何建議來優化這個。對不起，我的英語不好

來源

2010-10-24 raziel

「我已經寫了一個代碼來計算二進制文件中的每個字節頻率。使用Linq」這是...「令人欽佩」。 – 2010-10-24 20:33:44

@Kirk：爲什麼，你的反對意見是？ – 2010-10-24 20:36:01

這花了一點在第二上442MB的文件上我的，否則狹小戴爾筆記本：

 byte[] ltext = File.ReadAllBytes(@"c:\temp\bigfile.bin"); 
     var freq = new long[256]; 
     var sw = Stopwatch.StartNew(); 
     foreach (byte b in ltext) { 
      freq[b]++; 
     } 
     sw.Stop(); 
     Console.WriteLine(sw.ElapsedMilliseconds);

很難擊敗的陣列的原始PERF。

來源

2010-10-24 20:44:21

哎唷，好運選擇答案標記:) – 2010-10-24 20:45:13

@Hans：我們選取了甚至相同的變量名稱：-D – Vlad 2010-10-24 20:47:10

@Hans：結果不像他的LINQ結果那樣按頻率排序，所以時間不是可比。 – 2010-10-24 20:48:45

爲什麼不

int[] freq = new int[256]; 
foreach (byte b in ltext) 
    freq[b]++;

？

來源

2010-10-24 20:43:45 Vlad

結果不像他的LINQ結果那樣頻率排序，所以時間是不可比的。另外，它應該是'int [] freq'而不是'int freq []'。 – 2010-10-24 20:48:28

@Chris：謝謝你的語法提示，我的C++背景仍然很明顯:)然而，OP沒有要求排序表，只是爲了列出頻率，所以我的代碼滿足了要求，並且速度更快。無論如何，我期望用包含字節值及其頻率的'struct'替換int，並按頻率值對數組進行排序仍然會更快。 – Vlad 2010-10-24 20:59:46

同意使用'struct'可能會比LINQ版本更快，但我會說頻率排序仍然是一個需求，因爲這是原始代碼所做的。無論如何，+1和你和漢斯，因爲這就是我的想法無論如何:) – 2010-10-24 21:03:47

下面顯示了在發佈模式下構建時，我的機器上的465MB文件中的字節數在9秒內以降序排列。

請注意，通過讀取100000字節塊中的文件（您可以試驗這個--16K塊在我的機器上沒有明顯區別），我已經更快了。重點是內部循環是提供字節的內部循環。調用Stream.ReadByte（）的速度很快，但幾乎不如索引數組中的字節那麼快。

此外，將整個文件讀入內存會產生極大的內存壓力，這會影響性能，並且如果文件足夠大，將會完全失敗。

using System; 
using System.Diagnostics; 
using System.IO; 
using System.Linq; 

class Program 
{ 
    static void Main(string[] args) 
    { 
     Console.WriteLine("Reading file..."); 
     var sw = Stopwatch.StartNew(); 
     var frequency = new long[ 256 ]; 
     using (var input = File.OpenRead(@"c:\Temp\TestFile.dat")) 
     { 
      var buffer = new byte[ 100000 ]; 
      int bytesRead; 
      do 
      { 
       bytesRead = input.Read(buffer, 0, buffer.Length); 
       for (var i = 0; i < bytesRead; i++) 
        frequency[ buffer[ i ] ]++; 
      } while (bytesRead == buffer.Length); 
     } 
     Console.WriteLine("Read file in " + sw.ElapsedMilliseconds + "ms"); 

     var result = frequency.Select((f, i) => new ByteFrequency { Byte = i, Frequency = f }) 
      .OrderByDescending(x => x.Frequency); 
     foreach (var byteCount in result) 
      Console.WriteLine(byteCount.Byte + " " + byteCount.Frequency); 
    } 

    public class ByteFrequency 
    { 
     public int Byte { get; set; } 
     public long Frequency { get; set; } 
    } 
}

來源

2010-10-24 21:38:32

Linq優化計數和組

回答

相關問題