2016-09-25 56 views
0

我想聚集一堆JSON文件到一個單一的三個來源和三年。儘管到目前爲止,我只能通過這種單調乏味的方式做到這一點,但我相信我能以更聰明,更優雅的方式來做到這一點。循環內的R中的JSON文件的循環

json1 <- lapply(readLines("NYT_1989.json"), fromJSON) 
json2 <- lapply(readLines("NYT_1990.json"), fromJSON) 
json3 <- lapply(readLines("NYT_1991.json"), fromJSON) 
json4 <- lapply(readLines("WP_1989.json"), fromJSON) 
json5 <- lapply(readLines("WP_1990.json"), fromJSON) 
json6 <- lapply(readLines("WP_1991.json"), fromJSON) 
json7 <- lapply(readLines("USAT_1989.json"), fromJSON) 
json8 <- lapply(readLines("USAT_1990.json"), fromJSON) 
json9 <- lapply(readLines("USAT_1991.json"), fromJSON) 

jsonl <- list(json1, json2, json3, json4, json5, json6, json7, json8, json9) 

請注意,從1989年到1991年這三個文件的年份都是一樣的。任何想法?謝謝!

PS:每個文件中的數據示例:

{"date": "December 31, 1989, Sunday, Late Edition - Final", "body": "Frigid temperatures across much of the United States this month sent demand for heating oil soaring, providing a final upward jolt to crude oil prices. Some spot crude traded at prices up 40 percent or more from a year ago. Will these prices hold? Five experts on oil offer their views. That's assuming the economy performs as expected - about 1 percent growth in G.N.P. The other big uncertainty is the U.S.S.R. If their production drops more than 4 percent, prices could stengthen. ", "title": "Prospects;"} 
{"date": "December 31, 1989, Sunday, Late Edition - Final", "body": "DATELINE: WASHINGTON, Dec. 30 For years, experts have dubbed Czechoslovakia's spy agency the ''two Czech'' service. But he cautioned against euphoria. ''The Soviets wouldn't have relied on just official cooperation,'' he said. ''It would be surprising if they haven't unilaterally penetrated friendly services with their own agents, too.'' ", "title": "Upheaval in the East: Espionage;"} 
{"date": "December 31, 1989, Sunday, Late Edition - Final", "body": "SURVIVING the decline in the economy will be the overriding issue for 1990, say leaders of the county's business community. Successful Westchester business owners will face and overcome these risks and obstacles. Westchester is a land of opportunity for the business owner. ", "title": "Coping With the Economic Prospects of 1990"} 
+2

用'list.files'獲取文件名列表,然後執行'lapply(FILELIST,function(x)fromJSON(readLines(x))'''? – Shape

回答

1

在這裏你去:

require(jsonlite) 

filelist <- c("NYT_1989.json","NYT_1990.json","NYT_1991.json", 
       "WP_1989.json", "WP_1990.json","WP_1991.json", 
       "USAT_1989.json","USAT_1990.json","USAT_1991.json") 

newJSON <- sapply(filelist, function(x) fromJSON(readLines(x))) 

閱讀在短短的輸入文件中的每一行的body條目。

您問及如何只讀取JSON文件的子集。引用的文件數據實際上不是JSON格式。這是JSON喜歡的,因此我們必須將輸入修改爲fromJSON()才能正確讀取數據。我們將fromJSON()$body的結果解引用,僅提取body變量。

filelist <- c("./data/NYT_1989.json", "./data/NYT_1990.json") 
newJSON <- sapply(filelist, function(x) fromJSON(sprintf("[%s]", paste(readLines(x), collapse = ",")), flatten = FALSE)$body) 
newJSON 

結果

> filelist <- c("./data/NYT_1989.json", "./data/NYT_1990.json") 
> newJSON <- sapply(filelist, function(x) fromJSON(sprintf("[%s]", paste(readLines(x), collapse = ",")), flatten = FALSE)$body) 
> newJSON 
    ./data/NYT_1989.json                                                                                                                     
[1,] "Frigid temperatures across much of the United States this month sent demand for heating oil soaring, providing a final upward jolt to crude oil prices. Some spot crude traded at prices up 40 percent or more from a year ago. Will these prices hold? Five experts on oil offer their views. That's assuming the economy performs as expected - about 1 percent growth in G.N.P. The other big uncertainty is the U.S.S.R. If their production drops more than 4 percent, prices could stengthen. " 
[2,] "DATELINE: WASHINGTON, Dec. 30 For years, experts have dubbed Czechoslovakia's spy agency the ''two Czech'' service. But he cautioned against euphoria. ''The Soviets wouldn't have relied on just official cooperation,'' he said. ''It would be surprising if they haven't unilaterally penetrated friendly services with their own agents, too.'' "                                     
[3,] "SURVIVING the decline in the economy will be the overriding issue for 1990, say leaders of the county's business community. Successful Westchester business owners will face and overcome these risks and obstacles. Westchester is a land of opportunity for the business owner. "                                                     
    ./data/NYT_1990.json                                                                                                                     
[1,] "Blue temperatures across much of the United States this month sent demand for heating oil soaring, providing a final upward jolt to crude oil prices. Some spot crude traded at prices up 40 percent or more from a year ago. Will these prices hold? Five experts on oil offer their views. That's assuming the economy performs as expected - about 1 percent growth in G.N.P. The other big uncertainty is the U.S.S.R. If their production drops more than 4 percent, prices could stengthen. " 
[2,] "BLUE1: WASHINGTON, Dec. 30 For years, experts have dubbed Czechoslovakia's spy agency the ''two Czech'' service. But he cautioned against euphoria. ''The Soviets wouldn't have relied on just official cooperation,'' he said. ''It would be surprising if they haven't unilaterally penetrated friendly services with their own agents, too.'' "                                     
[3,] "GREEN4 the decline in the economy will be the overriding issue for 1990, say leaders of the county's business community. Successful Westchester business owners will face and overcome these risks and obstacles. Westchester is a land of opportunity for the business owner. " 

您可能會發現下面的應用教程有用:

我也推薦閱讀:

信任我,當我說這個在線免費本書對我幫助很大。它也證實我是一個多重場合的白癡:-)

+0

非常感謝您的提示和答案@ Technophobe01!還有一件事,每個文件中的每篇文章都由「標題」和「正文」組成。我怎麼能聚集身體?任何想法,歡呼! –

+0

Andrea你有沒有數據的例子?您是否想要加載和發佈過濾器,或者將過濾器作爲加載的一部分?查看:http://zevross.com/blog/2015/02/12/using-r-to-download-and-parse-json-an-example-using-data-from-an-open-data-portal /。如果你有數據文件,我可以在今天下午更新答案。 – Technophobe01

+0

我想過濾爲負載的一部分。做每個文件的聚合,但只是在列表中添加每篇文章的正文。我使用數據示例編輯問題 –