用多行創建輸出文件（Python）

文件看起來是這樣的：

DS User ID 1 
random garbage 
random garbage 
DS N user name 1 
random garbage 
DS User ID 2 
random garbage 
random garbage 
DS N user name 2

到目前爲止，我有：

import sys 
import re 
f = open(sys.argv[1]) 

strToSearch = "" 

for line in f: 
     strToSearch += line 

patFinder1 = re.compile('DS\s+\d{4}|DS\s{2}\w\s{2}\w.*|DS\s{2}N', re.MULTILINE) 

for i in findPat1: 
    print(i)

我輸出到屏幕上看起來是這樣的：

DS user ID 1 
DS N user name 1 
DS user ID 2 
DS N user name 2

如果我寫使用到文件：

outfile = "test.dat" 
FILE = open(outfile,"a") 
FILE.writelines(line) 
FILE.close()

一切都推到了一個單行：

DS user ID 1DS N user name 1DS user ID 2DS N user name 2

我可以對輸出中第一個場景住。理想情況下，儘管我想從輸出文件中去除'DS'和'DS N'，並用逗號分隔。

User ID 1,user name 1 
User ID 2, username 2

有關如何完成此任務的任何想法？

來源

2011-03-01 user639302

嗨，歡迎來到StackOverflow。請花一分鐘熟悉編輯器，特別是可用於格式化代碼的代碼按鈕「{}」。 – 2011-03-01 13:13:14

這顯然不是你真正的計劃。首先，你永遠不會使用正則表達式。它也不符合你提供的樣本，至少不是其中的大部分。你永遠不會定義'findPat1'。 – 2011-03-01 13:19:06

請清楚描述你的輸入數據是什麼樣子以及你用什麼標準來匹配。從您的示例看，尋找以DS開頭的行應該足夠了 - 如果不是，請說明規則。您似乎正在嘗試匹配相應的用戶標識/用戶名條目。如果我們知道你在做什麼，我們當然可以向你展示一個更好的方式。 – 2011-03-01 13:25:15

這是很難提供一個強大的解決方案，而不瞭解實際輸入的數據格式，多大的靈活性是允許以及將如何使用分析的數據。

從剛纔樣品輸入/上面給定的輸出，一個能快速煮了一個工作示例代碼：

out = open("test.dat", "a") # output file 

for line in open("input.dat"): 
    if line[:3] != "DS ": continue # skip "random garbage" 

    keys = line.split()[1:] # split, remove "DS" 
    if keys[0] != "N": # found ID, print with comma 
     out.write(" ".join(keys) + ",") 
    else: # found name, print and end line 
     out.write(" ".join(keys[1:]) + "\n")

輸出文件將是：

User ID 1,user name 1 
User ID 2,user name 2

此代碼當然可以製成如果格式規範已知，則使用regex更健壯。例如：上述

import re 
pat_id = re.compile(r"DS\s+(User ID\s+\d+)") 
pat_name = re.compile(r"DS\s+N\s+(.+\s+\d+)") 
out = open("test.dat", "a") 

for line in open("input.dat"): 
    match = pat_id.match(line) 
    if match: # found ID, print with comma 
     out.write(match.group(1) + ",") 
     continue 
    match = pat_name.match(line) 
    if match: # found name, print and end line 
     out.write(match.group(1) + "\n")

這兩個例子假定「用戶ID X」之前「N用戶名X」，因此「」和‘\ n’個相應的後字符總是。

如果訂單不是特定的，可以使用數字ID作爲關鍵字將值存儲在字典中，然後在解析完所有輸入後打印出ID /名稱對。

如果您提供更多信息，或許我們可以提供更多幫助。

來源

2011-03-01 13:26:05

很好，正是我所需要的。謝謝。試圖擺脫DS和DS N，這將是完美的。 – user639302 2011-03-02 03:33:05

優秀的技巧 - 令人難以置信的有用的解釋。 – mbb 2012-10-31 02:46:20

print在參數後面添加換行符，但writelines不會。所以，你必須這樣寫：

file = open(outfile, "a") 
file.writelines((i + '\n' for i in findPat1)) 
file.close()

的writelines聲明也可以寫爲：

for i in findPat1: 
    file.write(i + '\n')

來源

2011-03-01 13:24:55

不錯，我從+1開始。 – doug 2013-03-07 20:18:13

FILE.writelines(line)

不添加行分隔符。

只要做到：

FILE.write(line + "\n")

或者：

FILE.write("\n".join(lines))

來源

2011-03-01 13:27:04 stderr

import re 

ch ='''\ 
DS User ID 1 
random garbage 
random garbage 
DS N user name 1 
random garbage 
DS User ID 2 
random garbage 
random garbage 
DS N user name 2''' 

RE = '^DS (User ID (\d+)).+?^DS N(user name \\2)' 

with open('outputfile.txt','w') as f: 
    for match in re.finditer(RE,ch,re.MULTILINE|re.DOTALL): 
     f.write(','.join(match.groups())+'\n')

編輯：

更換

RE = '^DS (User ID \d+).+?^DS N(user name \d+)'

與

RE = '^DS (User ID (\d+)).+?^DS N(user name \\2)'

來源

2011-03-01 14:38:23 eyquem

用多行創建輸出文件（Python）

回答

相關問題