2015-10-16 44 views
0

我想刮掉並使用R.提取所有相關鏈接的列表。 例如,考慮:List of Cuisines on wikipedia這裏的美食分爲區域,種族等,這些鏈接本身和更進一步細分爲更多的鏈接和層次結構。我想提取R. 這整個層次使用正則表達式的一般定義鏈接將在返回網頁上的所有鏈接,但我想有,所有的依賴關係列出一個表,如:使用R的鏈接的鏈接清單

  1. 名單美食:
    • 目錄亞洲美食
    • 列表歐洲美食的中歐美食的
    • 列表
    • 奧地利美食
    • 保加利亞美食
    • 捷克美食
    • 德國菜......等等。
    • 列表海洋美食 的...

我知道怎麼湊數據從使用R.我是相當新的,並想知道我如何去提取依賴一個網頁鏈接之間。

+0

你想要的是真的不CL耳。上述「正則表達式」和所需輸出_in'dput()'風格的R objects_的一個例子可以幫助人們幫助你。你也沒有提供任何代碼來表明你真的知道如何去做這些刮擦(這不是代碼寫入服務)。 – hrbrmstr

回答

1

你可以做,例如下面的:如果這是你正在尋找

require(rvest) 
require(magrittr) 
session <- html_session("https://en.wikipedia.org/wiki/List_of_cuisines") 
session %>% html_nodes("ul:nth-child(13) a") %>% html_text() 
[1] "Ainu"    "Akan"    "Arab"    "Assyrian"   "Balochi"   
[6] "Berber"    "Buddhist"   "Bulgarian"   "Cajun"    "Chinese Islamic" 
[11] "Circassian"   "Crimean Tatar"  "Inuit"    "Italian American" "Jewish"    
[16] "Sephardic"   "Mizrahi"   "Bukharan"   "Syrian Jewish"  "Kurdish"   
[21] "Malayali Food"  "Louisiana Creole" "Maharashtrian"  "Mordovian"   "Native American" 
[26] "Parsi"    "Pashtun"   "Pennsylvania Dutch" "Peranakan"   "Persian cuisine" 
[31] "Punjabi"   "Rajasthani"   "Romani"    "Sami"    "Sindhi"    
[36] "Tatar"    "Yamal"    "Zanzibari"   "South Indian"  

什麼?如果您想更深入和刮的所有環節,你可以去上進行如下:

cousin_links <- session %>% html_nodes("ul:nth-child(13) a") %>% html_attr("href") 
articles <- lapply(cousin_links, jump_to, x = session) 
explainaition <- lapply(articles, function(a){ 
    a %>% html %>% html_node("p") %>% html_text 
}) 

,讓你與第一維基百科釋(在一個以上的列表內容盒

> head(explainaition) 
[[1]] 
[1] "Ainu cuisine is the cuisine of the ethnic Ainu in Japan. The cuisine differs markedly from that of the majority Yamato people of Japan. Raw meat like sashimi, for example, is not served in Ainu cuisine, which instead uses methods such as boiling, roasting and curing to prepare meat. The island of Hokkaidō in northern Japan is where most Ainu live today; however, they once inhabited most of the Kuril islands, the southern half of Sakhalin island, and parts of northern Honshū Island." 

[[2]] 
[1] "Akan cuisine, the cuisine of the Akan people, includes meat and fish (seafood) grilled over hot coals, wide and varied range of soups, stews, several kinds of starch foods, groundnut, palm, patties (or empanadas), ground corn (maize), sadza, ugali." 

[[3]] 
[1] "Arab cuisine (Arabic: مطبخ عربي‎) is defined as the various regional cuisines spanning the Arab world, from Mesopotamia to North-Africa. Arab cuisine often incorporates the Levantine and Egyptian culinary traditions." 

[[4]] 
[1] "The cuisine of the indigenous Assyrian people from northern Iraq, north eastern Syria, north western Iran and south eastern Turkey is similar to other Middle Eastern cuisines. It is rich in grains, meat, tomato, and potato. Rice is usually served with every meal accompanied by a stew which is typically poured over the rice. Tea is typically consumed at all times of the day with or without meals, alone or as a social drink. Cheese, crackers, biscuits, baklawa, or other snacks are often served alongside the tea as appetizers. Dietary restrictions may apply during Lent in which certain types of foods may not be consumed; often meaning animal-derived. Alcohol is rather popular specifically in the form of Arak and Wheat Beer. Unlike in Jewish cuisine and Islamic cuisines in the region, pork is allowed, but it is not widely consumed because of restrictions upon availability imposed by the Muslim majority." 

[[5]] 
[1] "Balochi cuisine refers to the food and cuisine of the Baloch people from the Balochistan region, comprising the Pakistani Balochistan province as well as Sistan and Baluchestan in Iran and Balochistan, Afghanistan. Baloch food has a regional variance in contrast to many other cuisines of Pakistan[1][2][3][4] and Iran." 

[[6]] 
[1] "The Amazigh (Berber) cuisine is considered as a traditional cuisine which evolved little in the course of time." 
+0

嗨@FlooO這不正是我一直在尋找我想從維基百科link..in精華提取層次樹會像 類型美食 1.美國 - 北美 *加拿大 *加州..等 - 中美洲 * Mexican..etc - 南美 *巴西 * Chilean..etc 2.亞洲 - 東亞 *日本 * Korean..etc 等.. 非常感謝。 –

+0

在哪個網站上您可以找到您想要搜索的信息? – Rentrop

+0

我正在尋找從[鏈接](http://www.bbc.co.uk/food/ingredients)的所有食品成分清單,並列在「A」下的一個單獨的列,在「 B「在一個單獨的列中,等等。 –