2017-07-24 45 views
-1

我正在使用python在csv文件中寫入一些文本.. 下面是以哪種方式獲取文件中的寫入數據的屏幕截圖。 enter image description here在使用python的文件中的打印文本中的一點點誤差

您可以看到,在Channel Social Media Links列中,所有鏈接都在其他下一行單元格中寫入良好,但第一個鏈接未在Channel Social Media Links列中寫入。請我怎麼能這樣寫? enter image description here

我的Python腳本是在這裏

from urllib.request import urlopen as uReq 
from bs4 import BeautifulSoup as soup 

myUrl='https://www.youtube.com/user/HolaSoyGerman/about' 


uClient = uReq(myUrl) 
page_html = uClient.read() 
uClient.close() 

page_soup = soup(page_html, "html.parser") 

containers = page_soup.findAll("h1",{"class":"branded-page-header-title"}) 

filename="Products2.csv" 
f = open(filename,"w") 

headers = "Channel Name,Channel Description,Channel Social Media Links\n" 

f.write(headers) 

channel_name = containers[0].a.text 
print("Channel Name :" + channel_name) 

# For About Section Info 
aboutUrl='https://www.youtube.com/user/HolaSoyGerman/about' 


uClient1 = uReq(aboutUrl) 
page_html1 = uClient1.read() 
uClient1.close() 

page_soup1 = soup(page_html1, "html.parser") 

description_div = page_soup.findAll("div",{"class":"about-description 
branded-page-box-padding"}) 
channel_description = description_div[0].pre.text 
print("Channel Description :" + channel_description) 
f.write(channel_name+ "," +channel_description) 
links = page_soup.findAll("li",{"class":"channel-links-item"}) 
for link in links: 
social_media = link.a.get("href") 
f.write(","+","+social_media+"\n") 
f.close() 
+1

你不包括換行符f.write後'(CHANNEL_NAME + 「」 + CHANNEL_DESCRIPTION)',所以當然第一行將會進一步結束。另外請注意'',「+」,「==」,,「',並且CSV模塊支持從序列中寫入,而不是自己添加逗號。 – jonrsharpe

+0

那麼我怎樣才能實現這個請給我例子。我希望社交媒體鏈接不以逗號分隔。我希望在社交媒體鏈接中的所有鏈接應寫入社交媒體鏈接列的新下一行單元 –

回答

1

,如果你寫你的文件時利用Python的CSV庫這將有助於。這可以將項目列表轉換爲正確的逗號分隔值。

from urllib.request import urlopen as uReq 
from bs4 import BeautifulSoup as soup 
import csv 

myUrl = 'https://www.youtube.com/user/HolaSoyGerman/about' 

uClient = uReq(myUrl) 
page_html = uClient.read() 
uClient.close() 

page_soup = soup(page_html, "html.parser") 
containers = page_soup.findAll("h1",{"class":"branded-page-header-title"}) 
filename = "Products2.csv" 

with open(filename, "w", newline='') as f: 
    csv_output = csv.writer(f) 
    headers = ["Channel Name", "Channel Description", "Channel Social Media Links"] 
    csv_output.writerow(headers) 

    channel_name = containers[0].a.text 
    print("Channel Name :" + channel_name) 

    # For About Section Info 
    aboutUrl = 'https://www.youtube.com/user/HolaSoyGerman/about' 

    uClient1 = uReq(aboutUrl) 
    page_html1 = uClient1.read() 
    uClient1.close() 

    page_soup1 = soup(page_html1, "html.parser") 

    description_div = page_soup.findAll("div",{"class":"about-description branded-page-box-padding"}) 
    channel_description = description_div[0].pre.text 
    print("Channel Description :" + channel_description) 

    links = [link.a.get('href') for link in page_soup.findAll("li",{"class":"channel-links-item"})] 
    csv_output.writerow([channel_name, channel_description, links[0]]) 

    for link in links[1:]: 
     csv_output.writerow(['', '', link]) 

這會給你一個單行每個在最後一列的HREFs,例如:

Channel Name,Channel Description,Channel Social Media Links 
HolaSoyGerman.,Los Hombres De Verdad Usan Pantuflas De Perrito,http://www.twitter.com/germangarmendia 
,,http://instagram.com/germanchelo 
,,http://www.youtube.com/juegagerman 
,,http://www.youtube.com/juegagerman 
,,http://www.twitter.com/germangarmendia 
,,http://instagram.com/germanchelo 
,,https://plus.google.com/108460714456031131326 

每個writerow()通話將寫值的列表,以該文件爲逗號分隔值並在最後自動爲你添加換行符。所需要的就是構建每行的值列表。首先將您的鏈接的第一個,並在您的頻道描述後,使其成爲列表中的最後一個值。其次,爲前兩列有空值的剩餘鏈接寫一行。


爲了回答您的評論,下面應該讓你開始:

from urllib.request import urlopen as uReq 
from bs4 import BeautifulSoup as soup 
import csv 

def get_data(url, csv_output): 

    if not url.endswith('/about'): 
     url += '/about' 

    print("URL: {}".format(url)) 
    uClient = uReq(url) 
    page_html = uClient.read() 
    uClient.close() 

    page_soup = soup(page_html, "html.parser") 
    containers = page_soup.findAll("h1", {"class":"branded-page-header-title"}) 

    channel_name = containers[0].a.text 
    print("Channel Name :" + channel_name) 

    description_div = page_soup.findAll("div", {"class":"about-description branded-page-box-padding"}) 
    channel_description = description_div[0].pre.text 
    print("Channel Description :" + channel_description) 

    links = [link.a.get('href') for link in page_soup.findAll("li", {"class":"channel-links-item"})] 
    csv_output.writerow([channel_name, channel_description, links[0]]) 

    for link in links[1:]: 
     csv_output.writerow(['', '', link]) 

    #TODO - get list of links for the related channels 

    return related_links 


my_url = 'https://www.youtube.com/user/HolaSoyGerman' 
filename = "Products2.csv" 

with open(filename, "w", newline='') as f: 
    csv_output = csv.writer(f) 
    headers = ["Channel Name", "Channel Description", "Channel Social Media Links"] 
    csv_output.writerow(headers) 

    for _ in range(5): 
     next_links = get_data(my_url, csv_output) 
     my_url = next_links[0]  # e.g. follow the first of the related links 
+0

我使用網絡抓取從YouTube的頻道獲取信息並將其保存在csv文件中。現在我希望當任何youtube的頻道信息已經在csv文件中獲取,然後從「相關頻道」部分自動獲取第一頻道的網址將獲取一個變量,然後整個過程應該完成,並繼續進行5次。我怎樣才能做到這一點? –

+0

我會說這是一個完全不同的問題。你需要用一個例子來更好地解釋它。如果我的答案解決了您的第一個問題,我建議您接受解決方案(點擊上/下按鈕下方的灰色勾號),然後開始更詳細的第二個問題,並提供更多詳細信息。 –

+0

我想要我的腳本如下流程 –

相關問題