我在名爲1.htm - 100.htm的文件夾中有100個文件。我運行此代碼從文件中提取一些信息，並將提取的信息放入另一個文件final.txt中。目前，我必須爲100個文件手動運行程序。我需要構建一個可以運行程序100次的循環，讀取每個文件一次。循環遍歷文件

下面是對文件6.htm代碼（在我需要在我的代碼做詳細的編輯詳細講解和藹）：

import glob 
import BeautifulSoup 
from BeautifulSoup import BeautifulSoup 


fo = open("6.htm", "r") 
bo = open("output.txt" ,"w") 
f = open("final.txt","a+") 

htmltext = fo.read() 
soup = BeautifulSoup(htmltext) 
#print len(urls) 
table = soup.findAll('table') 
rows = table[0].findAll('tr'); 
for tr in rows: 
    cols = tr.findAll('td') 
    for td in cols: 
     text = str(td.find(text=True)) + ';;;' 
     if(text!="&nbsp;;;;"): 
      bo.write(text); 
      bo.write('\n'); 
fo.close() 
bo.close() 

b= open("output.txt", "r") 

for j in range (1,5): 
str=b.readline(); 
for j in range(1, 15): 
str=b.readline(); 
c=str.split(";;;") 
#print c[1] 
if(c[0]=="APD ID:"): 
    f.write(c[1]) 
    f.write("#") 
if(c[0]=="Name/Class:"): 
    f.write(c[1]) 
    f.write("#") 
if(c[0]=="Source:"): 
    f.write(c[1]) 
    f.write("#") 
if(c[0]=="Sequence:"): 
    f.write(c[1]) 
    f.write("#") 
if(c[0]=="Length:"): 
    f.write(c[1]) 
    f.write("#") 
if(c[0]=="Net charge:"): 
    f.write(c[1]) 
    f.write("#") 
if(c[0]=="Hydrophobic residue%:"): 
    f.write(c[1]) 
    f.write("#") 
if(c[0]=="Boman Index:"): 
    f.write(c[1]) 
    f.write("#") 
f.write('\n'); 
b.close(); 
f.close(); 



f.close(); 
print "End"

來源

2014-03-24 Whiskeyjack

這就是http://docs.python.org/3/library/fileinput.html的用途。 – 2014-03-24 14:00:49

另外，'對於範圍（1,5）中的j：'從未使用過？或者至少你不要在任何地方使用'j'，並且製表符縮進在多個位置是完全錯誤的。 – Torxed

import os 
f = open("final.txt","a+") 
for root, folders, files in os.walk('./path/to/html_files/'): 
    for fileName in files: 
     fo = open(os.path.abspath(root + '/' + fileName, "r") 
     ...

然後你的代碼的其餘部分則有。

還要考慮（最佳實踐）

with open(os.path.abspath(root + '/' + fileName, "r") as fo: 
    ...

所以你不要忘記關閉這些文件句柄，因爲句柄允許您的操作系統打開的文件的數量有限，這將使肯定你不會錯誤地填寫它。

使你的代碼看起來是這樣的：

import os 
with open("final.txt","a+") as f: 
    for root, folders, files in os.walk('./path/to/html_files/'): 
     for fileName in files: 
      with open(os.path.abspath(root + '/' + fileName, "r") as fo: 
       ...

而且NEVER代替全局變量，名字如str：

str=b.readline();

但也沒有必要;在年底你代碼行，這是Python ..我們以一種舒適的方式進行編碼！

最後但並非最不重要..

if(c[0]=="APD ID:"): 
if(c[0]=="Name/Class:"): 
if(c[0]=="Source:"): 
if(c[0]=="Sequence:"): 
if(c[0]=="Length:"): 
if(c[0]=="Net charge:"): 
if(c[0]=="Hydrophobic residue%:"): 
if(c[0]=="Boman Index:"):

應該是：

if(c[0]=="APD ID:"): 
elif(c[0]=="Name/Class:"): 
elif(c[0]=="Source:"): 
elif(c[0]=="Sequence:"): 
elif(c[0]=="Length:"): 
elif(c[0]=="Net charge:"): 
elif(c[0]=="Hydrophobic residue%:"): 
elif(c[0]=="Boman Index:"):

除非你修改沿ofcourse方式c，你不..所以切換！

狗屎我只是不斷髮現這個代碼更可怕的事情（你顯然有一份從實例粘貼來自全國各地所有的星系...）：

您可以凝聚所有上述if/elif/else成一個如果塊：

if(c[0] in ("APD ID:", "Name/Class:", "Source:", "Sequence:", "Length:", "Net charge:", "Hydrophobic residue%:", "Boman Index:")): 
    f.write(c[1]) 
    f.write("#")

並且還跳過( ... )在你的if塊，再次..這是Python的..我們編程以舒適的方式：

if c[0] in ("APD ID:", "Name/Class:", "Source:", "Sequence:", "Length:", "Net charge:", "Hydrophobic residue%:", "Boman Index:"): 
    f.write(c[1]) 
    f.write("#")

來源

2014-03-24 14:13:20 Torxed

這將拾取./path/to/html_files/目錄中的所有文件。 OP只想讀取1-100.htm文件？ –

@RishabhSagar可能，Calpratt擊敗了我。沒有看到將100個HTML文件存儲在其他文件所在的目錄中的點，並且此解決方案也可以與其他或隨機的文件命名約定一起使用。 – Torxed

好的代碼審查！ :) –

也許有些結構是這樣的：

# declare main files 
bo = open("output.txt" ,"w") 
f = open("final.txt","a+") 

#loop over range ii = [1,100] 
for ii in range(1,101): 
    fo = open(str(ii) + ".htm", "r") 
    # Run program like normal 
    ... 
    ... 
    ... 
    fo.close() 
f.close() 
bo.close()

來源

2014-03-24 14:16:09 flakes

回溯（最近呼叫最後一次）： fo = open（str（ii）+「）文件」C：\ Users \ Manish \ Dropbox \ karabi \ work files \ new1.py「，第12行，在中。 htm「，」r「） TypeError：'str'對象不可調用 – Whiskeyjack

如果聲明一個名爲'str'的變量，將無法調用'str（）'函數。那是因爲你現在有一個帶有這個名字的局部變量，python會首先查看。嘗試將你的字符串變量'str'重命名爲更有意義的東西。 – flakes

非常感謝。我做的。 – Whiskeyjack

os.listdir列出特定目錄下的所有文件。

正如@Torxed指出的，最好的做法是使用with子句（以便文件句柄關閉）。

你可以找。htm文件如下：

import os 

# Creates a list of 1-100.htm file names 
filenames = map(lambda x: str(x) + ".htm", range(1,101)) 

for file in os.listdir("/mydir"): 
    if (file in filenames): 
     # Do your logic here.

來源

2014-03-24 14:56:31

循環遍歷文件

回答

還要考慮（最佳實踐）

相關問題