更改應用不同條件的文件的內容

我正在嘗試對輸入文件的內容進行一些更改。輸入文件我看起來像下面這樣：更改應用不同條件的文件的內容

18800000 20400000 pau 
20400000 21300000 aa 
21300000 22500000 p 
22500000 23200000 l 
23200000 24000000 ay 
24000000 25000000 k 
25000000 26500000 pau

此文件是一個音頻文件的轉錄。第一個數字表示開始時間，下一個數字表示結束時間。然後字母表示聲音。

我必須做的改變是，有一些聲音是由兩種不同的聲音組成的，即也有一些雙元音。所以這些雙元音必須分成兩個聲音。在上面的例子中，雙元音是'ay'。它由'ao'和'ih'組成。這裏發生的是，'ay'的持續時間是24000000 - 232000000 = 8被分配到這兩個聲音中。其結果將是，

23200000 24000000 ay

變化

23200000 236000000 ao 
23600000 240000000 ih

我試圖寫一個僞代碼看起來垃圾。

def test(transcriptionFile) : 
    with open("transcriptions.txt", "r+") as tFile : 
     for line in tFile : 
      if 3rd_item = ay 
       duration = (2nd_item[1] - 1st_item[2])/2 
       delete the line 
       tFile.write(1st_item, 1st_item + d, ao) 
       tfile.write(1st_item + d, 1st_item, ih) # next line 

if__name__ == "__main__" : 
    test("transcriptions.txt")

謝謝。

隨着我給出的建議，我將代碼更改爲以下內容。它仍然不正確。

def test(transcriptionFile) : 
    with open("transcriptions.txt", "r") as tFile : 
     inp = tFile.readlines() 

    outp = [] 
    for ln in inp : 
     start, end, sound = ln.strip() 
     if sound == ay : 
      duration = (end - start)/2 
      ln.delete 
      start = start 
      end = start + duration 
      sound = ao 
      outp.append(ln) 
      start = start + duration # next line 
      end = start 
      sound = ih 
      outp.append(ln) 

    with open("transcriptions.txt", "w") as tFile: 
     tFile.writelines(outp) 

__name__ == "__main__" 
test("transcriptions.txt")

來源

2011-11-21 zingy

一旦你得到你的代碼運行，你需要測試，如果__name__ ==「__main__」，只有執行測試，如果這是真的 –

以下腳本應該做你想做的事：

import sys 

def main(src, dest): 
    with open(dest, 'w') as output: 
     with open(src) as source: 
      for line in source: 
       try: 
        start, end, sound = line.split() 
       except ValueError: 
        continue 
       if sound == 'ay': 
        start = int(start) 
        end = int(end) 
        offset = (end - start) // 2 
        output.write('%s %s ao\n' % (start, start + offset)) 
        output.write('%s %s ih\n' % (start + offset, end)) 
       else: 
        output.write(line) 

if __name__ == "__main__": 

    main(*sys.argv[1:])

輸出：

18800000 20400000 pau 
20400000 21300000 aa 
21300000 22500000 p 
22500000 23200000 l 
23200000 23600000 ao 
23600000 24000000 ih 
24000000 25000000 k 
25000000 26500000 pau

來源

2011-11-21 15:13:22 ekhumoro

非常感謝... – zingy

就地編輯文本文件非常困難。你最好的選擇是：

編寫程序作爲一個Unix filter，即產生於sys.stdout的新文件，並把它在的地方與外部工具
讀取整個文件，然後構造新在內存中存檔並將其寫出。

如下思考的第二個行會看起來像一個程序：

# read transcriptions.txt into a list of lines 
with open("transcriptions.txt", "r") as tFile: 
    inp = tFile.readlines() 

# do processing and build a new list of lines 
outp = [] 
for ln in inp: 
    if not to_be_deleted(ln): 
     outp.append(transform(ln)) 

# now overwrite transcriptions.txt 
with open("transcriptions.txt", "w") as tFile: 
    tFile.writelines(outp)

它甚至會更好，如果你寫的處理比特作爲一個列表理解：

outp = [transform(ln) for ln in inp 
         if not to_be_deleted(ln)]

來源

2011-11-21 12:33:11

過濾器的方法肯定是好多了。更易於調試，並且可以獲得其他UNIX文本操作工具的強大功能。 –

我怎樣才能參考每一行的項目？ – zingy

@zingy：做'開始，結束，聲音= ln.split（）'。 –

更改應用不同條件的文件的內容

回答

相關問題