我想rvest網站(汽車銷售平臺)的特定部分。
這個CSS對我來說太困惑了,以至於我無法自己弄清楚什麼是錯的。
#### scraping the website www.otomoto.pl with used cars #####
baseURL_otomoto = "https://www.otomoto.pl/osobowe/?page="
i <- 1
for (i in 1:7000)
{
link = paste0(baseURL_otomoto,i)
out = read_html(link)
print(i)
print(link)
### building year
build_year = html_nodes(out, xpath = '//*[@id="body-container"]/div[2]/div[1]/div/div[6]/div[2]/article[1]/div[2]/div[3]/ul/li[1]') %>%
html_text() %>%
str_replace_all("\n","") %>%
str_replace_all("\r","") %>%
str_trim()
mileage = html_nodes(out, xpath = '//*[@id="body-container"]/div[2]/div[1]/div/div[6]/div[2]/article[1]/div[2]/div[3]/ul/li[2]') %>%
html_text() %>%
str_replace_all("\n","") %>%
str_replace_all("\r","") %>%
str_trim()
volume = html_nodes(out, xpath = '//*[@id="body-container"]/div[2]/div[1]/div/div[6]/div[2]/article[1]/div[2]/div[3]/ul/li[3]') %>%
html_text() %>%
str_replace_all("\n","") %>%
str_replace_all("\r","") %>%
str_trim()
fuel_type = html_nodes(out, xpath = '//*[@id="body-container"]/div[2]/div[1]/div/div[6]/div[2]/article[1]/div[2]/div[3]/ul/li[4]') %>%
html_text() %>%
str_replace_all("\n","") %>%
str_replace_all("\r","") %>%
str_trim()
price = html_nodes(out, xpath = '//div[@class="offer-item__price"]') %>%
html_text() %>%
str_replace_all("\n","") %>%
str_replace_all("\r","") %>%
str_trim()
link = html_nodes(out, xpath = '//div[@class="offer-item__title"]') %>%
html_text() %>%
str_replace_all("\n","") %>%
str_replace_all("\r","") %>%
str_trim()
offer_details = html_nodes(out, xpath = '//*[@id="body-container"]/div[2]/div[1]/div/div[6]/div[2]/article[1]/div[2]/div[3]/ul') %>%
html_text() %>%
str_replace_all("\n","") %>%
str_replace_all("\r","") %>%
str_trim()
任何猜測可能是什麼原因造成這種行爲?
PS#1。
如何將所分析的網站上提供的所有build_type,mileage和fuel_type數據一次作爲data.frame?使用類(xpath ='// div [@class = ...)在我的情況下不起作用
PS#2。
我想通過f.i獲得實際報價的詳細信息。
gear_type = html_nodes(out, xpath = '//*[@id="parameters"]/ul[1]/li[10]/div') %>%
html_text() %>%
str_replace_all("\n","") %>%
str_replace_all("\r","") %>%
str_trim()
參數
- 在UL並[a]是對於在(1:2)& 在利
- 並[b]是對於b中(1:12)
不幸的是,雖然這個概念因結果數據框爲空而失敗。任何猜測爲什麼?
謝謝米羅斯瓦夫。您的意見和建議絕對具有巨大的附加價值。儘快回覆最終結果(可行)。 –