如何提取R中單詞子集的詞頻？

我有一個數據框在一列中有大約10,000個字，而在另一列中有相應的頻率。我也有一個約600字的矢量。每個600字是數據幀中的一個字。如何從10,000字數據幀中查找600字矢量的頻率？如何提取R中單詞子集的詞頻？

2017-08-10 Namenlos

'match'或'merge'之間恆定。 – Gregor

建議R-FAQ重複[如何加入數據]（https://stackoverflow.com/q/1299871/903061） – Gregor

使用dplyr的連接函數。

# make the 600 vector into a dataframe 
600_df <- as.data.frame(600_vec) 

# left join the two dataframes 
df <- left_join(x = 600_df, y = 10000_df, by = "word")

其中「字」是變量名兩個dataframes

來源

2017-08-11 01:18:41 sweetmusicality

在衆多的解決方案，與df$words是您的data.frame的話和wordsvector作爲載體的柱：

library(plyr) 
freqwords <- ddply(df, .(words), summarize, n = length(words)) #shows frequency of all the words in the data.frame 
freqwords[freqwords$words %in% wordsvector,] #keeping only the words that appear in your vector

下一次，如果你提供一些虛擬的數據，所以我們會有所幫助可以幫助你更好。

來源

2017-08-10 19:33:32 user3640617

如何提取R中單詞子集的詞頻？

回答

相關問題