2011-05-24 127 views
11

我有一個程序在寫完輸出時寫入輸出,並且一個特定的文件需要很長時間,而且我想知道是否可以採取一些措施來提高速度。提高寫入文件的速度

這個文件最終被25個MBS或更多 它大約有17000行,每行有大約500場

它的工作方式是:

procedure CWaitList.WriteData(AFile : string; AReplicat : integer; AllFields : Boolean); 
var 
    fout : TextFile; 
    idx, ndx : integer; 
    MyPat : CPatientItem; 
begin 
    ndx := FList.Count - 1; 
    AssignFile(fout, AFile); 
    Append(fout); 
    for idx := 0 to ndx do 
    begin 
     MyPat := CPatientItem(FList.Objects[idx]); 
     if not Assigned(MyPat) then Continue; 
     MyPat.WriteItem(fout, AReplicat, AllFields); 
    end; 
    CloseFile(fout); 
end; 

WriteItem是獲取所有的程序來自MyPat的值並將它們寫入文件,並且還調用3個其他函數,這些函數也將值寫入文件

因此整體而言,WriteData循環最終在1700左右,並且每行結束時具有大約500個字段

我只是還是想知道如果有什麼我可以做,以提高其性能,因爲它有多少數據寫入

感謝

+0

您會考慮使用流而不是帕斯卡爾文件I/O? – 2011-05-24 16:27:04

+0

或TStringList與SaveTo文件()?但首先你必須測試在不寫文件的情況下循環數據的速度。 – 2011-05-24 16:31:43

+0

你有沒有跑分析器?它會告訴你你的程序在哪裏花費時間。 – 2011-05-24 16:43:32

回答

5

我的天堂」,如果它總是要花費很長的時間牛逼做了一段時間,但你應該能夠設置一個更大的文本I/O緩衝是這樣的:從Sysinternals的

var 
    fout : TextFile; 
    idx, ndx : integer; 
    MyPat : CPatientItem; 
    Buffer: array[0..65535] of char; // 64K - example 
begin 
    ndx := FList.Count - 1; 
    AssignFile(fout, AFile); 
    SetTextBuf(fout, Buffer); 
    Append(fout); 
4

使用的Process Explorer來觀看輸出。我想你會看到你正在寫成千上萬的小塊。使用流式I/O,在一個I/O操作中寫入的內容將顯着改善。

http://live.sysinternals.com/procexp.exe

+0

TextFile輸出已被緩衝,但初始緩衝區大小(如果只有128個字節)。增加內部緩衝區大小將減少在Windows文件內核中花費的時間。文本文件的實現,即使它是一個古老的技術,也使用某種流式I/O。 – 2011-05-26 06:00:14

8

加快一個文本正確的方法是使用SetTextBuf。並且可能在所有文件訪問周圍添加{$I-} .... {$I+}

var 
    TmpBuf: array[word] of byte; 

.. 
    {$I-} 
    AssignFile(fout, AFile); 
    Append(fout); 
    SetTextBuf(fOut,TmpBuf); 
    for idx := 0 to ndx do 
    begin 
     MyPat := CPatientItem(FList.Objects[idx]); 
     if not Assigned(MyPat) then Continue; 
     MyPat.WriteItem(fout, AReplicat, AllFields); 
    end; 
    if ioresult<>0 then 
    ShowMessage('Error writing file'); 
    CloseFile(fout); 
    {$I+} 
end; 

在所有情況下,舊的文件API是不被採用時下...

{$I-} .... {$I+}要還增加了周圍所有的子例程將內容添加到文本文件。

我做了一些關於巨大文本文件和緩衝區創建的實驗。我已經在Open Source SynCommons單元中編寫了一個專門的課程,名爲TTextWriter,它是面向UTF-8的。我特別用JSON生產或LOG writing以最高速度使用它。它避免了大多數臨時堆分配(例如,用於從整數值轉換),所以它在多線程縮放方面甚至非常出色。一些高級方法可用於從開放數組中格式化一些文本,如format()函數,但速度更快。

下面是這個類的接口:

/// simple writer to a Stream, specialized for the TEXT format 
    // - use an internal buffer, faster than string+string 
    // - some dedicated methods is able to encode any data with JSON escape 
    TTextWriter = class 
    protected 
    B, BEnd: PUTF8Char; 
    fStream: TStream; 
    fInitialStreamPosition: integer; 
    fStreamIsOwned: boolean; 
    // internal temporary buffer 
    fTempBufSize: Integer; 
    fTempBuf: PUTF8Char; 
    // [0..4] for 'u0001' four-hex-digits template, [5..7] for one UTF-8 char 
    BufUnicode: array[0..7] of AnsiChar; 
    /// flush and go to next char 
    function FlushInc: PUTF8Char; 
    function GetLength: integer; 
    public 
    /// the data will be written to the specified Stream 
    // - aStream may be nil: in this case, it MUST be set before using any 
    // Add*() method 
    constructor Create(aStream: TStream; aBufSize: integer=1024); 
    /// the data will be written to an internal TMemoryStream 
    constructor CreateOwnedStream; 
    /// release fStream is is owned 
    destructor Destroy; override; 
    /// retrieve the data as a string 
    // - only works if the associated Stream Inherits from TMemoryStream: return 
    // '' if it is not the case 
    function Text: RawUTF8; 
    /// write pending data to the Stream 
    procedure Flush; 
    /// append one char to the buffer 
    procedure Add(c: AnsiChar); overload; {$ifdef HASINLINE}inline;{$endif} 
    /// append two chars to the buffer 
    procedure Add(c1,c2: AnsiChar); overload; {$ifdef HASINLINE}inline;{$endif} 
    /// append an Integer Value as a String 
    procedure Add(Value: Int64); overload; 
    /// append an Integer Value as a String 
    procedure Add(Value: integer); overload; 
    /// append a Currency from its Int64 in-memory representation 
    procedure AddCurr64(Value: PInt64); overload; 
    /// append a Currency from its Int64 in-memory representation 
    procedure AddCurr64(const Value: Int64); overload; 
    /// append a TTimeLog value, expanded as Iso-8601 encoded text 
    procedure AddTimeLog(Value: PInt64); 
    /// append a TDateTime value, expanded as Iso-8601 encoded text 
    procedure AddDateTime(Value: PDateTime); overload; 
    /// append a TDateTime value, expanded as Iso-8601 encoded text 
    procedure AddDateTime(const Value: TDateTime); overload; 
    /// append an Unsigned Integer Value as a String 
    procedure AddU(Value: cardinal); 
    /// append a floating-point Value as a String 
    // - double precision with max 3 decimals is default here, to avoid rounding 
    // problems 
    procedure Add(Value: double; decimals: integer=3); overload; 
    /// append strings or integers with a specified format 
    // - % = #37 indicates a string, integer, floating-point, or class parameter 
    // to be appended as text (e.g. class name) 
    // - $ = #36 indicates an integer to be written with 2 digits and a comma 
    // - £ = #163 indicates an integer to be written with 4 digits and a comma 
    // - µ = #181 indicates an integer to be written with 3 digits without any comma 
    // - ¤ = #164 indicates CR+LF chars 
    // - CR = #13 indicates CR+LF chars 
    // - § = #167 indicates to trim last comma 
    // - since some of this characters above are > #127, they are not UTF-8 
    // ready, so we expect the input format to be WinAnsi, i.e. mostly English 
    // text (with chars < #128) with some values to be inserted inside 
    // - if StringEscape is false (by default), the text won't be escaped before 
    // adding; but if set to true text will be JSON escaped at writing 
    procedure Add(Format: PWinAnsiChar; const Values: array of const; 
     Escape: TTextWriterKind=twNone); overload; 
    /// append CR+LF chars 
    procedure AddCR; {$ifdef HASINLINE}inline;{$endif} 
    /// write the same character multiple times 
    procedure AddChars(aChar: AnsiChar; aCount: integer); 
    /// append an Integer Value as a 2 digits String with comma 
    procedure Add2(Value: integer); 
    /// append the current date and time, in a log-friendly format 
    // - e.g. append '20110325 19241502 ' 
    // - this method is very fast, and avoid most calculation or API calls 
    procedure AddCurrentLogTime; 
    /// append an Integer Value as a 4 digits String with comma 
    procedure Add4(Value: integer); 
    /// append an Integer Value as a 3 digits String without any added comma 
    procedure Add3(Value: integer); 
    /// append a line of text with CR+LF at the end 
    procedure AddLine(const Text: shortstring); 
    /// append a String 
    procedure AddString(const Text: RawUTF8); {$ifdef HASINLINE}inline;{$endif} 
    /// append a ShortString 
    procedure AddShort(const Text: ShortString); {$ifdef HASINLINE}inline;{$endif} 
    /// append a ShortString property name, as '"PropName":' 
    procedure AddPropName(const PropName: ShortString); 
    /// append an Instance name and pointer, as '"TObjectList(00425E68)"'+SepChar 
    // - Instance must be not nil 
    procedure AddInstanceName(Instance: TObject; SepChar: AnsiChar); 
    /// append an Instance name and pointer, as 'TObjectList(00425E68)'+SepChar 
    // - Instance must be not nil 
    procedure AddInstancePointer(Instance: TObject; SepChar: AnsiChar); 
    /// append an array of integers as CSV 
    procedure AddCSV(const Integers: array of Integer); overload; 
    /// append an array of doubles as CSV 
    procedure AddCSV(const Doubles: array of double; decimals: integer); overload; 
    /// append an array of RawUTF8 as CSV 
    procedure AddCSV(const Values: array of RawUTF8); overload; 
    /// write some data as hexa chars 
    procedure WrHex(P: PAnsiChar; Len: integer); 
    /// write some data Base64 encoded 
    // - if withMagic is TRUE, will write as '"\uFFF0base64encodedbinary"' 
    procedure WrBase64(P: PAnsiChar; Len: cardinal; withMagic: boolean); 
    /// write some #0 ended UTF-8 text, according to the specified format 
    procedure Add(P: PUTF8Char; Escape: TTextWriterKind); overload; 
    /// write some #0 ended UTF-8 text, according to the specified format 
    procedure Add(P: PUTF8Char; Len: PtrInt; Escape: TTextWriterKind); overload; 
    /// write some #0 ended Unicode text as UTF-8, according to the specified format 
    procedure AddW(P: PWord; Len: PtrInt; Escape: TTextWriterKind); overload; 
    /// append some chars to the buffer 
    // - if Len is 0, Len is calculated from zero-ended char 
    // - don't escapes chars according to the JSON RFC 
    procedure AddNoJSONEscape(P: Pointer; Len: integer=0); 
    /// append some binary data as hexadecimal text conversion 
    procedure AddBinToHex(P: Pointer; Len: integer); 
    /// fast conversion from binary data into hexa chars, ready to be displayed 
    // - using this function with Bin^ as an integer value will encode it 
    // in big-endian order (most-signignifican byte first): use it for display 
    // - up to 128 bytes may be converted 
    procedure AddBinToHexDisplay(Bin: pointer; BinBytes: integer); 
    /// add the pointer into hexa chars, ready to be displayed 
    procedure AddPointer(P: PtrUInt); 
    /// append some unicode chars to the buffer 
    // - WideCharCount is the unicode chars count, not the byte size 
    // - don't escapes chars according to the JSON RFC 
    // - will convert the Unicode chars into UTF-8 
    procedure AddNoJSONEscapeW(P: PWord; WideCharCount: integer); 
    /// append some UTF-8 encoded chars to the buffer 
    // - if Len is 0, Len is calculated from zero-ended char 
    // - escapes chars according to the JSON RFC 
    procedure AddJSONEscape(P: Pointer; Len: PtrInt=0); overload; 
    /// append some UTF-8 encoded chars to the buffer, from a generic string type 
    // - faster than AddJSONEscape(pointer(StringToUTF8(string)) 
    // - if Len is 0, Len is calculated from zero-ended char 
    // - escapes chars according to the JSON RFC 
    procedure AddJSONEscapeString(const s: string); {$ifdef UNICODE}inline;{$endif} 
    /// append some Unicode encoded chars to the buffer 
    // - if Len is 0, Len is calculated from zero-ended widechar 
    // - escapes chars according to the JSON RFC 
    procedure AddJSONEscapeW(P: PWord; Len: PtrInt=0); 
    /// append an open array constant value to the buffer 
    // - "" will be added if necessary 
    // - escapes chars according to the JSON RFC 
    // - very fast (avoid most temporary storage) 
    procedure AddJSONEscape(const V: TVarRec); overload; 
    /// append a dynamic array content as UTF-8 encoded JSON array 
    // - expect a dynamic array TDynArray wrapper as incoming parameter 
    // - TIntegerDynArray, TInt64DynArray, TCardinalDynArray, TDoubleDynArray, 
    // TCurrencyDynArray, TWordDynArray and TByteDynArray will be written as 
    // numerical JSON values 
    // - TRawUTF8DynArray, TWinAnsiDynArray, TRawByteStringDynArray, 
    // TStringDynArray, TWideStringDynArray, TSynUnicodeDynArray, TTimeLogDynArray, 
    // and TDateTimeDynArray will be written as escaped UTF-8 JSON strings 
    // (and Iso-8601 textual encoding if necessary) 
    // - any other kind of dynamic array (including array of records) will be 
    // written as Base64 encoded binary stream, with a JSON_BASE64_MAGIC prefix 
    // (UTF-8 encoded \uFFF0 special code) 
    // - examples: '[1,2,3,4]' or '["\uFFF0base64encodedbinary"]' 
    procedure AddDynArrayJSON(const DynArray: TDynArray); 
    /// append some chars to the buffer in one line 
    // - P should be ended with a #0 
    // - will write #1..#31 chars as spaces (so content will stay on the same line) 
    procedure AddOnSameLine(P: PUTF8Char); overload; 
    /// append some chars to the buffer in one line 
    // - will write #0..#31 chars as spaces (so content will stay on the same line) 
    procedure AddOnSameLine(P: PUTF8Char; Len: PtrInt); overload; 
    /// append some wide chars to the buffer in one line 
    // - will write #0..#31 chars as spaces (so content will stay on the same line) 
    procedure AddOnSameLineW(P: PWord; Len: PtrInt); 
    /// serialize as JSON the given object 
    // - this default implementation will write null, or only write the 
    // class name and pointer if FullExpand is true - use TJSONSerializer. 
    // WriteObject method for full RTTI handling 
    // - default implementation will write TList/TCollection/TStrings/TRawUTF8List 
    // as appropriate array of class name/pointer (if FullExpand=true) or string 
    procedure WriteObject(Value: TObject; HumanReadable: boolean=false; 
     DontStoreDefault: boolean=true; FullExpand: boolean=false); virtual; 
    /// the last char appended is canceled 
    procedure CancelLastChar; {$ifdef HASINLINE}inline;{$endif} 
    /// the last char appended is canceled if it was a ',' 
    procedure CancelLastComma; {$ifdef HASINLINE}inline;{$endif} 
    /// rewind the Stream to the position when Create() was called 
    procedure CancelAll; 
    /// count of add byte to the stream 
    property TextLength: integer read GetLength; 
    /// the internal TStream used for storage 
    property Stream: TStream read fStream write fStream; 
    end; 

正如你所看到的,甚至有一些系列化可用的,並且CancelLastComma/CancelLastChar方法是從循環產生快速JSON或CSV數據非常有用。

關於速度和時序,這個例程比我的磁盤訪問要快,大約是100 MB/s。我認爲在TMemoryStream而不是TFileStream中附加數據時,它可以達到500 MB/s左右。

+0

嗨,好像使用緩衝區不會加速它。我會嘗試使用TFileStream – KingKong 2011-05-24 18:18:57

+0

天真使用TFileStream不會幫助。你也需要緩衝這些。你的磁盤可以更快嗎? – 2011-05-24 23:04:06

+0

@大衛你是完全正確的。 TFileStream只是Windows文件API的一個包裝,因此每次調用Write()時添加一些小內容時速度會很慢。緩衝是一個關鍵。另一種可能性應該是使用TMemoryStream然後SaveToFile(在Delphi 6/7下TMemoryStream使用慢的GlobalAlloc API - 不要使用它)。這正是我們的'TTextWriter'類所做的。當與大緩衝區一起使用時,FileText函數速度很快。瓶頸應該在子例程中,而不是用於將數據附加到文本內容的技術。 – 2011-05-25 05:12:58

0

當我工作的一個歸檔包,我注意到一個性能提升,當我寫的每512個字節,這是磁盤扇區的默認大小的塊。請注意,磁盤扇區的大小和文件系統塊的大小是兩回事!有WinAPI功能,這將得到您的分區的塊大小 - 看看here

0

我建議切換到TFileStream的或內存流,而不是老式的文件I/O。如果使用TFileStream,則可以根據估計的需要設置文件的大小,而不是讓程序搜索每個寫入使用的下一個空白塊。然後可以根據需要擴展它或截斷它。如果您使用TMemoryStream - 將數據保存並使用SaveToFile() - 則整個事件將一次從內存寫入文件。這應該會加快你的速度。

0

我懷疑的寫作時間是沒有問題的。例程的耗時部分是流出500個字段。你可以用等價長度的常量字符串替換字段流式智商。我會保證這會更快。所以,爲了優化例程,您需要優化字段流,而不是實際的寫!