如果你想寫新chunk1.txt ... chunkN.txt對於每個大塊,你可以這樣做的方式:
def chunk_file(name, lines_per_chunk, chunks_per_file):
def write_chunk(chunk_no, chunk):
with open("chunk{}.txt".format(chunk_no), "w") as outfile:
outfile.write("".join(i for i in chunk))
count, chunk_no, chunk_count, chunk = 1, 1, 0, []
with open(name, "r") as f:
for row in f:
if count > lines_per_chunk and row == "\n":
chunk_count += 1
count = 1
chunk.append("\n")
if chunk_count == chunks_per_file:
write_chunk(chunk_no, chunk)
chunk = []
chunk_count = 0
chunk_no += 1
else:
count += 1
chunk.append(row)
if chunk:
write_chunk(chunk_no, chunk)
chunk_file("test.txt", 3, 1)
您必須指定線,屬於大塊,之後預計換行。
說要大塊這個文件:
Some Data belonnging to chunk 1
Some Data belonnging to chunk 1
Some Data belonnging to chunk 1
Some Data belonnging to chunk 1
Some Data belonnging to chunk 1
Some Data belonnging to chunk 1
More Data, belonnging to chunk 2
More Data, belonnging to chunk 2
More Data, belonnging to chunk 2
第一小盤強烈行數從第二塊不同。 (7行比3行)
這個例子的輸出將是chunk1.txt:
Some Data belonnging to chunk 1
Some Data belonnging to chunk 1
Some Data belonnging to chunk 1
Some Data belonnging to chunk 1
Some Data belonnging to chunk 1
Some Data belonnging to chunk 1
而且chunk2.txt:
More Data, belonnging to chunk 2
More Data, belonnging to chunk 2
More Data, belonnging to chunk 2
這種方法假定lines_per_chunk是最小塊大小,因此即使塊的行數不同也可以工作。當達到最小塊大小時,我們只尋找空白行來結束塊。 在上面的例子中,沒有問題,第2行有一個空行,因爲尚未達到最小塊大小。如果第4行出現空行,並且塊數據之後繼續存在,則會出現問題,因爲指定的標準(行號和空行)無法單獨標識塊。
使用計數器和模數。 –
這可能對您有所幫助http://stackoverflow.com/a/544932/568901 – sangheestyle