解析和修改文件與Python

讓我們假設我有以下文件：解析和修改文件與Python

H 0 -15.7284260000000  -16.4229420000000  0.364919000000000 
H 0 -16.4853770000000  -15.1118660000000  0.364919000000000 
O 0 -17.9378060000000  -14.2325190000000  0.944687000000000 
H 0 -18.7307670000000  -14.6487540000000  0.606761000000000 
H 0 -17.9738160000000  -13.3376780000000  0.606761000000000 
H 0 -17.1677320000000  -11.1468579990000  0.307511000000000 
...

和包含在文件中的行號下面的列表，根據一定的標準劃分：index = [[1,3][4,7][2,5,6]]。

我想重寫文件，添加一個標籤到每一行根據假定的標準，即行1和3將得到標籤'H'，行4,7標籤'M'和行2,5,6標籤「L」，來獲取文件：

H 0 -15.7284260000000  -16.4229420000000  0.364919000000000 H 
H 0 -16.4853770000000  -15.1118660000000  0.364919000000000 L 
O 0 -17.9378060000000  -14.2325190000000  0.944687000000000 H 
H 0 -18.7307670000000  -14.6487540000000  0.606761000000000 M 
H 0 -17.9738160000000  -13.3376780000000  0.606761000000000 L 
H 0 -17.1677320000000  -11.1468579990000  0.307511000000000 L 
H 0 -10.3904079990000  -10.7642359990000  0.664160000000000 M 
...

我用下面的代碼，但我不能夠包括在write()方法所需要的條件，任何幫助是值得歡迎的。提前致謝。

try: 
    input_file = open(file, 'r') 
    input = input_file.readlines() 
    print 'Input file \"' + file + '\" was read' 
except: 
    error_mssg = 'Please provide an input file' 
    sys.exit(error_mssg) 

with open('output.com','w') as output: 
     while ii<=len(input)-1: 
     if(input[ii].strip()==''): 
      break 
     output.write(input[ii].strip()+' H'+'\n') 
     ii = ii + 1

來源

2017-03-06 Panadestein

你不能做什麼？ –

'即第1行和第3行將得到標籤'H'，第4,7行標籤'M'等等，以獲得文件'，你是如何決定H/M的？ '[2,5,6]'會得到什麼？以及根據什麼標準？ –

我無法根據列表中的數據選擇某個標籤，並將其添加到相關行中。這只是一個例子，對不起，如果不清楚，第[2,5,6]行會得到另一個標籤。 – Panadestein

file = 'input.txt' 

try: 
    input_file = open(file, 'r') 
    input_lines = input_file.readlines() 
    print('Input file \"' + file + '\" was read') 
except: 
    error_mssg = 'Please provide an input file' 
    sys.exit(error_mssg) 

index_mapping = {'H': [1,3], 
       'M': [4,7], 
       'L': [2,5,6]} 

index_mapping_reversed = {val : key for key in index_mapping for val in index_mapping[key]} 

index_mapping_reversed 
# {1: 'H', 2: 'L', 3: 'H', 4: 'M', 5: 'L', 6: 'L', 7: 'M'} 

with open('output.txt','w') as output: 
    for idx, line in enumerate(input_lines): 
     suffix = '' 
     if idx + 1 in index_mapping_reversed: 
      suffix = ' ' + index_mapping_reversed.get(idx + 1, '') 
     output.write(line.strip() + suffix + '\n')

output.txt中：

H 0 -15.7284260000000  -16.4229420000000  0.364919000000000 H 
H 0 -16.4853770000000  -15.1118660000000  0.364919000000000 L 
O 0 -17.9378060000000  -14.2325190000000  0.944687000000000 H 
H 0 -18.7307670000000  -14.6487540000000  0.606761000000000 M 
H 0 -17.9738160000000  -13.3376780000000  0.606761000000000 L 
H 0 -17.1677320000000  -11.1468579990000  0.307511000000000 L

來源

2017-03-06 15:44:58

對於您而言，最簡單的方法可能是在您將線條寫回之前執行一些中間處理。

你想要的字符追加到列表中的每一行，定列表/人物配對的幾種組合：

def append_char(text, char, lines): 
    """Given a list of text lines, text, a char, and a list of line 
    numbers, lines, append the char to each line identified by number. 
    Note that line numbers start at 1, while text indexes start at 0. 
    """ 
    for l in lines: 
     text[l-1] += ' ' + char

然後運行它，這樣做：

letters = 'HM' 

for i, ch in enumerate(letters): 
    append_char(input, ch, index[i])

要知道，如果有任何碰撞，你會得到'噓陛下'，而不是'噓HM'，如果這很重要。

來源

2017-03-06 15:35:11

d = { 0 : 'H', 
     1 : 'H', 
     2 : 'M', 
    } 
def ending(i): 
    return d.get(i, '') + '\n' 

with open('input.txt') as f: 
    lines = f.readlines() 

with open('output.txt', 'w+') as o: 
    for i, line in enumerate(lines): 
     o.write('{}{}'.format(line, ending(i)))

下面介紹一種方法。在這裏，我們封裝了用於確定函數ending中行結束的邏輯。如果您事先知道哪些行需要更改，您可以使用像這樣的字典解決方案。如果它需要一些計算（比如根據線本身），那麼重寫ending以反映這一點，確保它接受確定線路終點所需的所有信息作爲參數。

來源

2017-03-06 15:39:42

你沒有理由來讀取內存中的所有：如果你要處理大文件時，它不會加快什麼的只能浪費內存。

我不明白你怎麼設法獲得魔法值'H'和'M'，所以我認爲他們的index數組中給予的，我認爲進行預處理陣列來獲得地圖{LINE_NUMBER：標籤}。然後，我只需要一次一個讀取輸入行，然後添加標籤（如果存在）：

index = [([1,3], 'H'), ([4,7], 'M'), ([2,5,6], None)] 

def preprocess(index): 
    h = {} 
    for elt in index: 
     if elt[1] is not None: 
      for num in elt[0]: h[num] = elt[1] 
    return h 

with open(file, 'r') as inputfile: 
    with open('output.com','w') as output: 
     h = preprocess(index) 
     for num, line in enumerate(inputfile, 1): 
      if num in h: line = line.rstrip() + " " + h[num] + "\n" 
      dummy = output.write(line)

來源

2017-03-06 16:01:48

謝謝你的回答，我發佈的問題只是更大問題的一部分。但是，也許你是對的，我不需要在內存中加載所有內容，我會檢查。 – Panadestein

解析和修改文件與Python

回答

相關問題