2016-01-13 78 views
1

我試圖讀取並追加到一個文件,但是當我使用情況管理器它似乎並沒有工作。讀取和附加文件上下文管理器:似乎不讀,只寫

在這段代碼中,我試圖讓一個網站,包含在我的「serien」列表中的一個項目的所有鏈接。如果鏈接在列表中,我首先檢查鏈接是否已經在文件中。如果找到鏈接,則不應該再次追加鏈接。但它確實如此。

我要麼猜測,我沒有使用正確的模式或者說我有點搞砸了我的情況管理器。還是我使用上下文管理我的第一次完全錯誤

import requests 
from bs4 import BeautifulSoup 


serien = ['izombie', 'grandfathered', 'new-girl'] 
serien_links = [] 


#Gets chapter links 
def episode_links(index_url): 
    r = requests.get(index_url) 
    soup = BeautifulSoup(r.content, 'lxml') 
    links = soup.find_all('a') 
    url_list = [] 
    for url in links: 
     url_list.append((url.get('href'))) 
    return url_list 

urls_unfiltered = episode_links('http://watchseriesus.tv/last-350-posts/') 
with open('link.txt', 'a+') as f: 
    for serie in serien: 
     for x in urls_unfiltered: 
      #check whether link is already in file. If not write link to file 
      if serie in x and serie not in f.read(): 
       f.write('{}\n'.format(x)) 

此。提示將不勝感激。

編輯:類似的項目,沒有上下文經理。在這裏我也嘗試過使用上下文管理器,但是在我遇到同樣的問題後放棄了。

file2_out = open('url_list.txt', 'a') #local url list for chapter check 
for x in link_list: 
    #Checking chapter existence in folder and downloading chapter 
    if x not in open('url_list.txt').read(): #Is url of chapter in local url list? 
     #push = pb.push_note(get_title(x), x) 
     file2_out.write('{}\n'.format(x)) #adding downloaded chapter to local url list 
     print('{} saved.'.format(x)) 


file2_out.close() 

並與上下文管理器:

with open('url_list.txt', 'a+') as f: 
    for x in link_list: 
     #Checking chapter existence in folder and downloading chapter 
     if x not in f.read(): #Is url of chapter in local url list? 
      #push = pb.push_note(get_title(x), x) 
      f.write('{}\n'.format(x)) #adding downloaded chapter to local url list 
      print('{} saved.'.format(x)) 
+0

你說這個失敗,當你使用上下文管理器,你以前有沒有這個工作版本?如果是這樣,請顯示該代碼。類似的項目沒有上下文經理 –

+0

添加的代碼'被稱爲讀取整個文件,並在那之後它會和空字符串。嘗試讀取整個文件,並將其存儲到一個變量,然後檢查其內容過度和過度的循環。 – alpenmilch411

+1

第一次'f.read() – martineau

回答

0

爲@martineau提到f.read()讀取整個文件,然後得到空字符串。嘗試下面的代碼。它讀取要列出的內容並在列表中進行後續比較。

import requests 
from bs4 import BeautifulSoup 

serien = ['izombie', 'grandfathered', 'new-girl'] 
serien_links = [] 


# Gets chapter links 
def episode_links(index_url): 
    r = requests.get(index_url) 
    soup = BeautifulSoup(r.content, 'lxml') 
    links = soup.find_all('a') 
    url_list = [] 
    for url in links: 
     url_list.append((url.get('href'))) 
    return url_list 


urls_unfiltered = episode_links('http://watchseriesus.tv/last-350-posts/') 
with open('link.txt', 'a+') as f: 
    cont = f.read().splitlines() 
    for serie in serien: 
     for x in urls_unfiltered: 
      # check whether link is already in file. If not write link to file 
      if (serie in x) and (x not in cont): 
       f.write('{}\n'.format(x))