r
  • url
  • 2017-06-05 74 views 0 likes 
    0

    我想從R Studio中的文本數據中刪除「/ url?q =」。 這是我的谷歌搜索碼:如何從R中的文本數據中刪除「/ url?q =」

    ## Code for Google Search 
    # Enter Search Term Here 
    search.term <- "r-project" 
    # Creating Function 
    getGoogleURL <- function(search.term, domain = '.co.in', quotes=TRUE) 
    { 
        # Getting Search Term 
        search.term <- gsub(' ', '%20', search.term) 
        if(quotes) search.term <- paste('%22', search.term, '%22', sep='') 
        # Putting Search Term in Google Search 
        getGoogleURL <- paste('http://www.google', domain, '/search?q=', search.term, sep='') } 
    
    ## Get Links from Google Search 
    # Creating Function to Get URLs From Search Results 
    getGoogleLinks <- function(google.url) { 
        # Creating a File to Save URLs 
        doc <- getURL(google.url, httpheader = c("User-Agent" = "R(3.4.0)")) 
        # Removing HTML code and Setting Nodes 
        html <- htmlTreeParse(doc, useInternalNodes = TRUE, error=function(...){}) 
        nodes <- getNodeSet(html, "//h3[@class='r']//a") 
        return(sapply(nodes, function(x) x <- xmlAttrs(x)[["href"]])) } 
    
    ## Remove quoted text, Create URL List 
        quotes <- "FALSE" 
        search.url <- getGoogleURL(search.term=search.term, quotes=quotes) 
        links <- getGoogleLinks(search.url) 
    
    ## Print URL List 
        links 
    

    而我的結果是:

    [1]? 「/ URL Q = https://www.r-project.org/&sa=U&ved=0ahUKEwj78ZWXoabUAhUcTI8KHaTEDTIQFggUMAA&usg=AFQjCNEqtiOAIA7OOTa3meWC8zaTjjTy8A
    [2]「/ URL Q = http://www.cran.r-project.org/&sa=U&ved=0ahUKEwj78ZWXoabUAhUcTI8KHaTEDTIQjBAIGzAB&usg=AFQjCNF8QmYbLzG0c66QZM2wsXF1n1-9tQ

    如何從上面的鏈接中刪除」/ url?q =「?

    回答

    1

    您可以使用gsub。

    ## Code for Google Search 
    # Enter Search Term Here 
    search.term <- "r-project" 
    # Creating Function 
    getGoogleURL <- function(search.term, domain = '.co.in', quotes=TRUE) 
    { 
        # Getting Search Term 
        search.term <- gsub(' ', '%20', search.term) 
        if(quotes) search.term <- paste('%22', search.term, '%22', sep='') 
        # Putting Search Term in Google Search 
        getGoogleURL <- paste('http://www.google', domain, '/search?q=', search.term, sep='') } 
    
    ## Get Links from Google Search 
    # Creating Function to Get URLs From Search Results 
    getGoogleLinks <- function(google.url) { 
        # Creating a File to Save URLs 
        doc <- getURL(google.url, httpheader = c("User-Agent" = "R(3.4.0)")) 
        # Removing HTML code and Setting Nodes 
        html <- htmlTreeParse(doc, useInternalNodes = TRUE, error=function(...){}) 
        nodes <- getNodeSet(html, "//h3[@class='r']//a") 
        return(sapply(nodes, function(x) x <- xmlAttrs(x)[["href"]])) } 
    
    ## Remove quoted text, Create URL List 
        quotes <- "FALSE" 
        search.url <- getGoogleURL(search.term=search.term, quotes=quotes) 
        links <- getGoogleLinks(search.url) 
    
    ## Print URL List 
        gsub("/url?q=", "", links) 
    
    +0

    謝謝,但我解決了我的問題,而不是'substring'。 –

    0

    或者改爲@ JTeam的答案你可以試試這個(中給出的鏈接總是/url?q=開始):

    lapply(links,function(x) paste0(strsplit(x,'=')[[1]][-1],collapse = ''))

    這給你乾淨的鏈接一個很好的列表(如果你喜歡一個向量,嘗試sapply

    +0

    謝謝,但我解決了我的問題,而不是'substring'。 –

    1

    我這種方式解決它,因爲他們有限的字符數

    links <- substring(links,8) 
    
    相關問題