2017-07-26 39 views
0

如何從使用此功能收集的轉發中提取user_id?如何提取每個元素的用戶標識

## get only first 8 words from each tweet 
x <- lapply(strsplit(dat$text, " "), "[", 1:8) 
x <- lapply(x, na.omit) 
x <- vapply(x, paste, collapse = " ", character(1)) 
## get rid of hyperlinks 
x <- gsub("http[\\S]{1,}", "", x, perl = TRUE) 
## encode for search query (handles the non ascii chars) 
x <- sapply(x, URLencode, USE.NAMES = FALSE) 
## get up to first 100 retweets for each tweet 
data <- lapply(x, search_tweets, verbose = FALSE) 

我有12個元素,每個包含用戶ID的列表,我怎麼只能提取用戶ID?

這裏是全碼:

library(rtweet) 
library(dplyr) 
library(plyr) 
require(reshape2) 

## search for day of rage tweets, try to exclude rt here 
dor <- search_tweets("#Newsnight -filter:retweets", n = 10000) 

## merge tweets data with unique (non duplicated) users data 
## exclude retweets 
## select status_id, retweet count, followers count, and text columns 
dat <- dor %>% 
    users_data() %>% 
    unique() %>% 
    right_join(dor) %>% 
    filter(!is_retweet) %>% 
    dplyr::select(user_id, screen_name, retweet_count, followers_count, text) %>% 
    filter(retweet_count >=50 & retweet_count <100 & followers_count < 10000 & followers_count > 500) 
dat 

## get only first 8 words from each tweet 
x <- lapply(strsplit(dat$text, " "), "[", 1:8) 
x <- lapply(x, na.omit) 
x <- vapply(x, paste, collapse = " ", character(1)) 
## get rid of hyperlinks 
x <- gsub("http[\\S]{1,}", "", x, perl = TRUE) 
## encode for search query (handles the non ascii chars) 
x <- sapply(x, URLencode, USE.NAMES = FALSE) 
## get up to first 100 retweets for each tweet 
data <- lapply(x, search_tweets, verbose = FALSE) 

There are 11 more elements like this

12 elements

+0

你是說'數據'是12個元素的列表?你可以顯示「數據」的外觀嗎? –

+0

@AlexP,我編輯了這個問題。我添加了一張圖片來顯示數據 –

+0

嗯...它說它是79x39。你說的12個元素在哪裏? –

回答

0

好了,你有12個dataframes列表,每個都有一個名爲user_id列。如果列表被命名,那麼這將起作用,如果沒有命名,則取出df_name = names(data)[x],部分。

lapply(1:12, function(x) { 
    df <- data[[x]] 
    data.frame(user_id = df$user_id, 
      # df_name = names(data)[x], 
      df_number = x, stringsAsFactors=FALSE) }) %>% 
dplyr::bind_rows() 

這應該給您提供全方位的用戶ID和以前的數據幀他們來自一個新的數據幀。

+0

如何添加顯示該用戶標識所屬數據框的第二列? –

+0

我已經修改了我的答案,以便它應該是關於您現在問的問題。如果有效,請接受答案。 –

+0

它顯示此錯誤:x $ user_id:$運算符對原子向量無效 –

相關問題