開幕式，並與蟒蛇

我想從這個修改我.fasta文件編輯文件夾中的多個文件：開幕式，並與蟒蛇

>YP_009208724.1 hypothetical protein ADP65_00072 [Achromobacter phage phiAxp-3] 
MSNVLLKQ... 

>YP_009220341.1 terminase large subunit [Achromobacter phage phiAxp-1] 
MRTPSKSE... 

>YP_009226430.1 DNA packaging protein [Achromobacter phage phiAxp-2] 
MMNSDAVI...

這樣：

>Achromobacter phage phiAxp-3 
MSNVLLKQ... 

>Achromobacter phage phiAxp-1 
MRTPSKSE... 

>Achromobacter phage phiAxp-2 
MMNSDAVI...

現在，我已經有一個腳本，可以做到一個單一的文件：

with open('Achromobacter.fasta', 'r') as fasta_file: 
    out_file = open('./fastas3/Achromobacter.fasta', 'w') 
    for line in fasta_file: 
     line = line.rstrip() 
     if '[' in line: 
      line = line.split('[')[-1] 
      out_file.write('>' + line[:-1] + "\n") 
     else: 
      out_file.write(str(line) + "\n")

但我不能自動化過程中的所有120個文件在我的文件夾。

我使用glob.glob試過，但我似乎無法使其工作：

import glob 

for fasta_file in glob.glob('*.fasta'): 
    outfile = open('./fastas3/'+fasta_file, 'w') 
    with open(fasta_file, 'r'): 
     for line in fasta_file: 
      line = line.rstrip() 
      if '[' in line: 
       line2 = line.split('[')[-1] 
       outfile.write('>' + line2[:-1] + "\n") 
      else: 
       outfile.write(str(line) + "\n")

它給了我這樣的輸出：

A 
c 
i 
n 
e 
t 
o 
b 
a 
c 
t 
e 
r 
. 
f 
a 
s 
t 
a

我設法讓所有的列表文件夾中的文件，但無法使用列表中的對象打開某些文件。

import os 


file_list = [] 
for file in os.listdir("./fastas2/"): 
    if file.endswith(".fasta"): 
     file_list.append(file)

來源

2017-08-01 tahunami

在第二代碼片段，你迭代的文件名，而不是文件：'在fasta_file'線。您需要在'with'語句中給文件對象一個名稱。 –

考慮到您現在可以更改文件名的內容，您需要自動執行此過程。我們通過刪除文件處理程序來更改一個文件的功能。

def file_changer(filename): 
    data_to_put = '' 
    with open(filename, 'r+') as fasta_file: 
     for line in fasta_file.readlines(): 
      line = line.rstrip() 
      if '[' in line: 
       line = line.split('[')[-1] 
       data_to_put += '>' + str(line[:-1]) + "\n" 
      else: 
       data_to_put += str(line) + "\n" 
     fasta_file.write(data_to_put) 
     fasta_file.close()

現在，我們需要遍歷所有的文件。因此，讓使用glob模塊，它

import glob 
for file in glob.glob('*.fasta'): 
    file_changer(file)

來源

2017-08-01 10:04:35

我有這個錯誤：'TypeError：強制轉換爲Unicode：需要字符串或緩衝區，找到類型' – tahunami

@tahunami在哪一行？ –

'第20行，在 file_changer（file）'和'第5行，在file_changer 中打開（文件名，'w'）as fasta_file：' – tahunami

你迭代的文件名，它給你的名稱，而不是文件的行中的所有字符。下面是代碼的一個修正版本：

import glob 

for fasta_file_name in glob.glob('*.fasta'): 
    with open(fasta_file_name, 'r') as fasta_file, \ 
      open('./fastas3/' + fasta_file_name, 'w') as outfile: 
     for line in fasta_file: 
      line = line.rstrip() 
      if '[' in line: 
       line2 = line.split('[')[-1] 
       outfile.write('>' + line2[:-1] + "\n") 
      else: 
       outfile.write(str(line) + "\n")

作爲替代的Python腳本，你可以簡單地使用sed命令行：

sed -i 's/^>.*\[\(.*\)\].*$/>\1/' *.fasta

這將修改到位的所有文件，所以考慮先複製它們。

來源

2017-08-01 09:56:49

你可以告訴我更多關於該行語法的信息： 'opena（fasta_file_name，'r'）as fasta_file，\ open（'./fastas3/'+ fasta_file_name，'w'）as outfile：' – tahunami

@tahunami所有文件都應該在'with'語句中打開以確保它們被正確關閉。您可以在單個'with'語句中打開多個文件，並且反斜槓僅用於續行。 –

開幕式，並與蟒蛇

回答

相關問題