將文件分割成Python中的關鍵字文件？

我想弄清楚如何使用關鍵字作爲拆分指示器，將文件分成子文件。就我而言，我有一個大的文件，看起來像這樣：將文件分割成Python中的關鍵字文件？

Racecar
line2...
line3...
Racecar
line5...
line6...
line7...
line8...
Racecar
line10...

在每次出現的單詞Racecar我想拆分文件並創建一個子文件。使用上面的例子，File_1有3行，File_2有5行，File_3有2行。這些文件應該是這樣的：

File_1：
Racecar
line2...
line3...

File_2：
Racecar
line5...
line6...
line7...
line8...

File_3：
Racecar
line10...

我知道像awk或者sed會更適合這個，但我需要做到這一點在Python。出於某種原因，我真的被困在這個上面。我試圖寫這樣的事情：

with open("bigfile", mode="r") as bigfile: 
    reader = bigfile.readlines() 
    for i,line in enumerate(reader): 
     if line.startswith("Racecar"): 
      header = line 
      header_num = i

我似乎陷入了困境，因爲我找不到獲得Racecar下一次出現的方法。我一直想使用next()函數，但顯然這不適用於字符串。我使用的文件足夠小，可以讀入內存。誰能幫我這個？提前致謝。

來源

2011-08-05 drbunsen

在QNX上你不需要Python;）。 [當發現正則表達式/模式時，QNX的拆分文件可以拆分文件]（http://www.qnx.com/developers/docs/6.5.0/topic/com.qnx.doc.neutrino_utilities/s/split.html）。 – user712092

with open("bigfile", mode="r") as bigfile: 
    reader = bigfile.read() 
    for i,part in enumerate(reader.split("Racecar")): 
     with open("File_" + i+1, mode="w") as newfile: 
      newfile.write("Racecar"+part)

來源

2011-08-05 16:39:39 Vader

'readlines'返回一個列表，並且你正在調用它不存在的split。 –

@Chris你說得對，我已經更新了我的答案。 – Vader

@Vader，閱讀行列表，然後加入看起來很浪費。那麼''bigfile.read（）'怎麼樣？ – senderle

out_array = [] 
with open("bigfile", mode="r") as bigfile: 
    for line in bigfile: 
     if line.startswith("Racecar"): 
      out_array.append(line) 
     else: 
      out_array[-1] += line 

for i in range(len(out_array)): 
    out_filename = "File%d.txt" % i 
    with open(out_filename, mode="w") as out_file: 
     out_file.write(out_array[i]);

有可能是更有效的方法，特別是避免兩個循環。但是，如果它像你聲稱的那麼小，那應該不重要。

來源

2011-08-05 16:39:30

好像你已經找到了一種方法來獲得Racecar的下一個發生。您的for循環將最終到達所有人。問題是當你到達他們時要做什麼。我不明白，每次你要什麼有header，header_num等

好像做的事情是，你做迭代線大文件（雖然沒有冗餘readlines），和你點擊一條Racecar行，打開一個新的輸出文件。

如：

with open("bigfile", mode="r") as bigfile: 
    smallfile_prefix = "File_" 
    file_count = 0 
    smallfile = open(smallfile_prefix + str(file_count), 'w') 
    for line in bigfile: 
     if line.startswith("Racecar"): 
      smallfile.close() 
      file_count += 1 
      smallfile = open(smallfile_prefix + str(file_count), 'w') 
     else: 
      smallfile.write(line) 
    smallfile.close()

還有其他的方法可以做到這一點 - 在Vader's answer一些變化可能是更好的，例如 - 但這似乎最接近原始的方法。

來源

2011-08-05 16:53:33 senderle

將文件分割成Python中的關鍵字文件？

回答

相關問題