Python：將解析的項目拆分爲CSV文件

我得到了Jamie Bull和PM 2Ring的建議，使用CSV模塊輸出我的網頁報廢。我差不多完成了，但是有一些用冒號或連字符分隔的已分析項。我希望將這些項目分解爲當前列表中的兩個項目。Python：將解析的項目拆分爲CSV文件

電流輸出：

GB，16,19,255,1，26:40，19,13,4,2，6-12，0-1，255,57 ，4.5,80,21,3.8,175,23-33，4.9,3,14,1,4,38.3,8,65,1,0 Sea，36,25,398,1,33:20 ，25,8,13,4,4-11,1-1，398,66,6.0,207,37,5.6,191,19-28，6.6,1,0,0,2,33.0,4,69,2,1

所需的輸出：（這些問題/差異以粗體顯示）

GB，16,19,255,1 ，26,40，19,13,4,2，6,12，0,1，255,57,4.5,80,21,3.8,175，23,33，4.9,3 ，14,1,4,38,3,8,65,1,0 Sea，36,25,398,1,33,20,25,8,13,4,4,11,1,1，398,66,6,207,37,5.6,191,19,28,6.6,1,0,0,2,33,4,69,2,1

我不確定在哪裏或如何進行這些更改。我也不知道是否需要正則表達式。很明顯，我可以在記事本或Excel中處理，但我的目標是在Python中處理所有這些。

如果你運行的程序，上述結果是從2014年賽季，一週1

import requests 
import re 
from bs4 import BeautifulSoup 
import csv 

year_entry = raw_input("Enter year: ") 

week_entry = raw_input("Enter week number: ") 

week_link = requests.get("http://sports.yahoo.com/nfl/scoreboard/?week=" + week_entry + "&phase=2&season=" + year_entry) 

page_content = BeautifulSoup(week_link.content) 

a_links = page_content.find_all('tr', {'class': 'game link'}) 

csvfile = open('NFL_2014.csv', 'a') 

writer = csv.writer(csvfile) 

for link in a_links: 
     r = 'http://www.sports.yahoo.com' + str(link.attrs['data-url']) 
     r_get = requests.get(r) 
     soup = BeautifulSoup(r_get.content) 
     stats = soup.find_all("td", {'class':'stat-value'}) 
     teams = soup.find_all("th", {'class':'stat-value'}) 
     scores = soup.find_all('dd', {"class": 'score'}) 

     try: 
       away_game_stats = [] 
       home_game_stats = [] 
       statistic = [] 
       game_score = scores[-1] 
       game_score = game_score.text 
       x = game_score.split(" ") 
       away_score = x[1] 
       home_score = x[4] 
       home_team = teams[1] 
       away_team = teams[0] 
       away_team_stats = stats[0::2] 
       home_team_stats = stats[1::2] 
       away_game_stats.append(away_team.text) 
       away_game_stats.append(away_score) 
       home_game_stats.append(home_team.text) 
       home_game_stats.append(home_score) 
       for stats in away_team_stats: 
         text = stats.text.strip("").encode('utf-8') 
         away_game_stats.append(text) 


       writer.writerow(away_game_stats) 

       for stats in home_team_stats: 
         text = stats.text.strip("").encode('utf-8') 
         home_game_stats.append(text) 

       writer.writerow(home_game_stats) 

     except: 
       pass 


csvfile.close()

任何幫助是極大的讚賞。這是我的第一個程序，搜索這個板子是一個很好的資源。

感謝，

來源

2014-12-07 J.T.

附註：除/ pass之外是危險的，因爲它隱藏了任何類型的錯誤。請參閱http://stackoverflow.com/questions/21553327/why-is-except-pass-a-bad-programming-practice – user2314737 2014-12-11 11:24:49

您可以使用正則表達式分割字符串，然後以「扁平化」的名單，以避免引號這樣的分組：

替代

writer.writerow(away_game_stats)

與

away_game_stats = [re.split(r"-|:",x) for x in away_game_stats] 
writer.writerow([x for y in away_game_stats for x in y])

（和相同）

來源

2014-12-07 18:20:33 user2314737

import re 
print re.sub(r"-|:",",",test_string)

觀看演示。

https://regex101.com/r/aQ3zJ3/2

來源

2014-12-07 14:18:46 vks

我用writer.writerow（[re.sub（r「 - |：'，'，' ，s）for s in home_game_stats]），它消除了冒號和連字符，但現在該項目被引號分組，使其仍然是csv文件中的一個項目，而不是兩個單獨的項目。 – 2014-12-07 16:24:32

@ J.T。像GB，16,19,255,1,26：40,19,13,4,2,6-12,0-1,255,57,4.5,80,21,3.8,175,23- 33,4.9,3,14,1,4,38.3,8,65,1,0'不在單個物品上。 – vks 2014-12-07 16:28:44

Python：將解析的項目拆分爲CSV文件

回答

相關問題