在D2中讀取字節的最快方法

我想盡可能快地將單個字節從文件讀取到D2應用程序中。應用程序需要每個字節的字節，因此讀取較大的數據塊不是讀取器接口的選項。在D2中讀取字節的最快方法

爲此我在C++，Java，D2中創建了一些簡單的實現：https://github.com/gizmomogwai/performance。

正如你所看到的，我嘗試了簡單的讀取，應用程序代碼和內存映射文件中的緩衝區。對於我的用例，內存映射解決方案效果最好，但奇怪的是D2比java慢。我希望D2能夠在C++和Java之間着陸（C++代碼使用-O3 -g編譯，D2代碼使用-O -release編譯）。

所以請告訴我在這裏做錯了什麼，以及如何加快D2的實施。

爲了讓您的用例這裏的想法是一個C++實現：

class StdioFileReader { 
private: 
    FILE* fFile; 
    static const size_t BUFFER_SIZE = 1024; 
    unsigned char fBuffer[BUFFER_SIZE]; 
    unsigned char* fBufferPtr; 
    unsigned char* fBufferEnd; 

public: 
    StdioFileReader(std::string s) : fFile(fopen(s.c_str(), "rb")), fBufferPtr(fBuffer), fBufferEnd(fBuffer) { 
    assert(fFile); 
    } 
    ~StdioFileReader() { 
    fclose(fFile); 
    } 

    int read() { 
    bool finished = fBufferPtr == fBufferEnd; 
    if (finished) { 
     finished = fillBuffer(); 
     if (finished) { 
    return -1; 
     } 
    } 
    return *fBufferPtr++; 
    } 

private: 
    bool fillBuffer() { 
    size_t l = fread(fBuffer, 1, BUFFER_SIZE, fFile); 
    fBufferPtr = fBuffer; 
    fBufferEnd = fBufferPtr+l; 
    return l == 0; 
    } 
}; 

size_t readBytes() { 
    size_t res = 0; 
    for (int i=0; i<10; i++) { 
    StdioFileReader r("/tmp/shop_with_ids.pb"); 
    int read = r.read(); 
    while (read != -1) { 
     ++res; 
     read = r.read(); 
    } 
    } 
    return res; 
}

這一點比起在d「相同」的解決方案更快：

struct FileReader { 

    private FILE* fFile; 
    private static const BUFFER_SIZE = 8192; 
    private ubyte fBuffer[BUFFER_SIZE]; 
    private ubyte* fBufferPtr; 
    private ubyte* fBufferEnd; 

    public this(string fn) { 
    fFile = std.c.stdio.fopen("/tmp/shop_with_ids.pb", "rb"); 
    fBufferPtr = fBuffer.ptr; 
    fBufferEnd = fBuffer.ptr; 
    } 
    public int read(ubyte* targetBuffer) { 
    auto finished = fBufferPtr == fBufferEnd; 
    if (finished) { 
     finished = fillBuffer(); 
     if (finished) { 
     return 0; 
     } 
    } 
    *targetBuffer = *fBufferPtr++; 
    return 1; 
    } 
    private bool fillBuffer() { 
    fBufferPtr = fBuffer.ptr; 
    auto l = std.c.stdio.fread(fBufferPtr, 1, BUFFER_SIZE, fFile); 
    fBufferEnd = fBufferPtr + l; 
    return l == 0; 
    } 
} 

size_t readBytes() { 
    size_t count = 0; 
    for (int i=0; i<10; i++) { 
    auto reader = FileReader("/tmp/shop_with_ids.pb"); 
    ubyte buffer[1]; 
    ubyte* p = buffer.ptr; 
    auto c = reader.read(p); 
    while (1 == c) { 
     ++count; 
     c = reader.read(p); 
    } 
    } 
    return count; 
}

來源

2011-08-26 Gizmomogwai

我在D和Java（數學密集計算）中做了一些其他非相關編碼，結果發現Java在我的測試中速度稍快。我猜你不應該指望java現在的速度要慢很多，JIT編譯器非常擅長優化。 –

是啊......你是對的......我不希望java比cpp慢（它仍然在我的演示示例中使用默認的jit），但我的觀點是d更慢。我希望d與cpp保持一致。 – Gizmomogwai

是的，當我在幾個月前將Java算法轉換爲D時，我也這麼做了。我認爲他們在代碼優化方面有一些怪癖。或者GC可能非常糟糕，而且速度很慢，那麼請嘗試改變它？ –

這是非常有可能因爲sfread。沒有人保證它在C中做同樣的事情 - 你很可能完全使用不同的CRT（除非你使用的是Digital Mars C++編譯器？）。

這意味着圖書館可能會做一些事情，比如同步等，這會降低速度。通過告訴鏈接器鏈接到相同的庫，您可以知道的唯一方法是通過強制 D使用與C相同的庫。

直到你可以做到這一點，你正在比較蘋果和橘子。如果這是不可能的，然後直接從調用操作系統和然後比較結果 - 這樣你就可以保證底層調用是相同的兩個。

來源

2011-08-26 11:44:37 Mehrdad

你是完全正確的。我不知道fread的實現是否相似。但我的問題是如何在d2中像在java甚至C++中一樣快地實現功能。 – Gizmomogwai

@ Gizmomogwai：對，但這個問題的含義是D本質上很慢。語言本質上很慢，因爲它的設計從根本上需要大量開銷，而語言中一個小區域由於尚未進行優化而很慢，所以語言之間存在很大差異。 – dsimcha

@Gizmomogwai：爲了在Java中或在C++中實現它，你只需要做他們正在做的任何事情 - 這可能意味着你應該在本地OS調用（'ReadFile'）上創建自己的緩衝包裝器Windows）並使用它，然後看看它是如何發展的。這會告訴你這是語言問題還是圖書館問題。 – Mehrdad

如果使用std.stdio module會發生什麼：

import std.stdio; 

struct FileReader { 

    private File fFile; 
    private enum BUFFER_SIZE = 8192;//why not enum? 
    private ubyte[BUFFER_SIZE] fBuffer=void;//avoid (costly) initialization to 0 
    private ubyte[] buff; 

    public this(string fn) { 
    fFile = File("/tmp/shop_with_ids.pb", "rb"); 
    } 

    /+ 
    public ~this(){//you really should have been doing this if you used std.c.stdio.fopen 
       //but it's unnecessary for std.stdio's File (it's ref counted) 
    fFile.close(); 
    } 
    +/ 

    public int read(out ubyte targetBuffer) { 
    auto finished = buff.length==0; 
    if (finished) { 
     finished = fillBuffer(); 
     if (finished) { 
     return 0; 
     } 
    } 
    targetBuffer = buff[0]; 
    buff = buff[1..$]; 
    return 1; 
    } 
    private bool fillBuffer() { 
    if(!fFile.isOpen())return false; 

    buff = fFile.rawRead(fBuffer[]); 

    return buff.length>0; 
    } 
} 

size_t readBytes() { 
    size_t count = 0; 
    for (int i=0; i<10; i++) { 
    auto reader = FileReader("/tmp/shop_with_ids.pb"); 
    ubyte buffer; 
    auto c = reader.read(buffer); 
    while (1 == c) { 
     ++count; 
     c = reader.read(buffer); 
    } 
    } 
    return count; 
}

如果你想真正的速度對比，你應該-release -O -inline編譯（該關閉調試（主要是陣列OOB檢查）

優化和內聯什麼它可以）（當然也與C++解決方案類似）

來源

2011-08-26 13:35:17

感謝您的意見。實際上我用-O編譯了d2代碼 - 釋放（在我的所有示例中，內嵌比較慢）。我更正了你的程序（fillBuffer應該返回buff.length == 0），我的基準測試在我的機器上測試了600ms（與80ms相比，這是cpp-mmapped解決方案）。請參閱github頁面上的表格。 – Gizmomogwai

在D2中讀取字節的最快方法

回答

相關問題