2011-02-10 54 views
2

我有一些JSON數據,這是其中一個片段:導入從HTML頁面JSONP數據,然後輸出到CSV

{"sweater":"15", "localtime":"7:14 PM", "xcoord":-61, 
    "desc":"John Smith SHOT on Jack Jones", "teamid":10,"strength":701, 
    "pid":8465200,"formalEventId":"TOR8", "period":1, "type":"Shot", "p3name":"", 
    "eventid":8, "p2name":"Jack Jones", "ycoord":21, "pid3":"", "time":"00:38", 
    "playername":"John Smith", "p1name":"John Smith", 
    "video":"2_26_ott_tor_0910_TOR8_save_800K_16x9.flv", "pid2":8469461, "pid1":8465200} 

我想抓住從一個HTML URL這種格式,這個信息:

http://foo.com/data/20092010/20090xxxxx/PxP.jsonp

其中xxxxx是一個5位遊戲代碼,我希望從列表中插入(通過循環)。

我最需要的數據是:毛衣,xcoord,teamid,力量,週期,類型,ycoord,時間,玩家名和將遊戲代碼(xxxxx)插入爲列。

因此,這將是:

Gamecode,毛衣,XCOORD,teamid,強度,期限,種類,YCOORD,時間,playername

然後,把它導出所有信息爲一(1)CSV文件。

任何人都可以幫助指出我在正確的方向嗎?

編輯:

我試圖導入JSON文件爲本地文件,使用下面的代碼:

#libraries 
library(RCurl) 
library(rjson) 
library(bitops) 

#fetch data 
j <- getURL("file:///Desktop/test.jsonp") 

#grab JSON 
j.list <- fromJSON(j) 

#get each data item 
j.df <- data.frame(playername = sapply(j.list, function(x) x$sweater)) 
j.df <- data.frame(xcoord = sapply(j.list, function(x) x$xcoord)) 
j.df <- data.frame(ycoord = sapply(j.list, function(x) x$ycoord)) 
j.df <- data.frame(type = sapply(j.list, function(x) x$type)) 

write.csv(j.df, file="fooPxP.csv") 

,並得到一個空的CSV文件。任何想法我做錯了什麼?

下面是一些實際的數據文件從beginining:提前

loadPlayByPlay({"data":{"refreshInterval":0,"game":{"awayteamid":9,"awayteamname":"Ottawa Senators","hometeamname":"Toronto Maple Leafs","plays":{"play":[{"sweater":"11","localtime":"7:14 PM","xcoord":76,"desc":"Daniel Alfredsson HIT on Tomas Kaberle","teamid":9,"strength":701,"pid":8460621,"formalEventId":"TOR51","period":1,"type":"Hit","p3name":"","eventid":51,"p2name":"Tomas Kaberle","ycoord":-40,"pid3":"","time":"00:16","playername":"Daniel Alfredsson","p1name":"Daniel Alfredsson","pid2":8465200,"pid1":8460621},{"sweater":"15","localtime":"7:14 PM","xcoord":-61,"desc":"Tomas Kaberle SHOT on Pascal Leclaire","teamid":10,"strength":701,"pid":8465200,"formalEventId":"TOR8","period":1,"type":"Shot","p3name":"","eventid":8,"p2name":"Pascal Leclaire","ycoord":21,"pid3":"","time":"00:38","playername":"Tomas Kaberle","p1name":"Tomas Kaberle","video":"2_26_ott_tor_0910_TOR8_save_800K_16x9.flv","pid2":8469461,"pid1":8465200}}}) 

謝謝!

回答

2

我從一個URL獲取JSON並轉換爲數據幀中寫道an article ,這可能會幫助你開始。

您可以獲取在RCurl庫利用的getURL()的數據,這樣的:

library(RCurl) 
j <- getURL("http://foo.com/data/20092010/20090xxxxx/PxP.jsonp") 

接下來,fromJSON()在rjson包應該將其轉換爲一個列表:

library(rjson) 
j.list <- fromJSON(j) 

然後,您可以從列表中構建數據框。例如,要得到一個名爲「毛衣」一欄,嘗試:

j.df <- data.frame(sweater = sapply(j.list, function(x) x$sweater)) 

正如參數添加更多的列到data.frame()使用其他JSON鍵。

要添加「xxxxx」,您需要使用類似grep()的東西來解析URL。

一旦你有了你的數據框,你就可以使用write.table()或write.csv()寫入CSV。對於很多網址,您必須弄清楚如何將fromJSON()生成的列表合併到一個數據框中。

+0

太棒了,謝謝!所以對於「強度」,它會是:j.df < - data.frame(strength = sapply(j.list,function(x)x $ strength))? – NeilG 2011-02-11 01:52:07

1

有R的功能和用於讀取URL任何東西(見幫助(download.file),也是rjson包上CRAN處理JSON數據。可能需要一些調整,如果它真的JSONP。

對於類似的例子,看看我的GEONAMES包 - 。從geonames.org讀取JSON數據,構建數據幀

如果它不是在CRAN那麼它在R-Forge的我忘了..