檢查單詞在字典中，並從另一列

獲得價值我有這兩個數據集：檢查單詞在字典中，並從另一列

stemmed <- data.frame(
    stem = c('super puper', 'only for you') 
) 


super <- data.frame(
    word = c('super', 'puper', 'you'), 
    weight = c(0.5, 0.1, 0.3) 
)

我檢查，如果一個字是正和負的字典，並計算了多少次。我有這樣一個循環：

for (i in 1:nrow(stemmed)){ 
    words = strsplit(as.character(stemmed$stem)," ") 
    stemmed$super[i] <- sum(words[[i]] %in% super$word)/length(words[[i]]) 
}

（順便說一句，如果你知道如何改進這個代碼，請告訴我。）

現在我想不僅計算詞的數量，但重量（包含在super$weight中的單詞權重的總和）。

於是，我就做這樣的事情在循環：

if (words[[i]] %in% super$word) { 
stemmed$super[i] = sum(with super[super$word==words[[i]],], 
         sum(super$weight))}

我希望得到這樣一個數據幀：

stem    super 
super puper  0.6 
only for you  0.3

我不`噸知道如何解決這個問題...

來源

2016-11-19 Dennix

'colSums（T（sapply（超$字，grepl，朵朵$幹））*超$權重）' – user20650

下你的心流，在'match'可能是你需要的功能 –

有很多方法可以做到這一點。遵循你的方法，我想將它包裝成一個sapply

> final <- stemmed 
> final$super <- sapply(stemmed$stem, function(x) { 
    sum(super$weight[super$word %in% unlist(strsplit(as.character(x), " "))]) 
}) 
> final 
      stem super 
1 super puper 0.6 
2 only for you 0.3

來源

2016-11-19 19:55:10 Sonny

> data.frame(stem=stemmed$stem, 
     super=sapply(lapply(strsplit(as.character(stemmed$stem), " ") , 
          function(txt) super$word %in% txt), 
        function(idx) sum(super$weight[idx]))) 
      stem super 
1 super puper 0.6 
2 only for you 0.3

來源

2016-11-19 19:55:56 p2004r

我想我找到了適合自己的解決方案，但我用data.tables代替data.frames。這種解決方案的優點是不使用應用/循環。

library("data.table") 
library("reshape2") 
stemmed <- data.frame(
    stem = c('super puper', 'only for you') 
) 

super <- data.table(
    word = c('super', 'puper', 'you'), 
    weight = c(0.5, 0.1, 0.3) 
) 


# Step 1: Split the words 
split_words <- strsplit(as.character(stemmed$stem), " ") 
names(split_words) <- stemmed$stem 
# Step 2: melt it to a data.table 
result <- data.table(melt(split_words)) 
setnames(result, names(result), c("word", "stem")) 
# Step 3: Find the weight by merging it with super 
setkey(super, word) 
setkey(result, word) 
word_weights <- super[result] 
# Step 4: Filter the NA weights 
word_weights <- word_weights[!is.na(weight)] 
# Step 5: Now aggregate by stem to find the weight per stem 
final_result <- word_weights[, list(super = sum(weight)), by = stem] 
> final_result 
      stem super 
1: super puper 0.6 
2: only for you 0.3

來源

2016-11-19 19:58:19

你可能想使用match。

stemmed <- data.frame(
    stem = c('super puper', 'only for you') 
) 

super <- data.frame(
    word = c('super', 'puper', 'you'), 
    weight = c(0.5, 0.1, 0.3) 
) 

# this line may be out of loop 
words <- strsplit(as.character(stemmed$stem)," ") 

for (i in 1:nrow(stemmed)){ 
    stemmed$super[i] <- sum(words[[i]] %in% super$word)/length(words[[i]]) 
    # get weights for super words 
    w.index <- na.exclude(match(words[[i]],super$word)) 
    if (length(w.index) > 0) stemmed$super[i] <- sum(super$weight[w.index]) 

} 

#~ > stemmed 
#~   stem super 
#~ 1 super puper 0.6 
#~ 2 only for you 0.3

來源

2016-11-19 20:20:39

檢查單詞在字典中，並從另一列

回答

相關問題