有問題的代碼我有一個數據集(問題)與4列和超過600K的觀察,其中的一列被命名爲' V3' 。 本專欄有類似'今日是什麼?'的問題。 我有第二個數據集(voc)有2列,其中一列名稱「單詞」和其他列名稱「同義詞」。如果在我的第一個數據集(問題)中存在來自列「同義詞」的第二個數據集(voc)的單詞,那麼我想從「單詞」列中替換它的單詞。
questions = cbind(V3=c("What is the day today?","Tom has brown eyes"))
questions <- data.frame(questions)
V3
1 what is the day today?
2 Tom has brown eyes
voc = cbind(word=c("weather", "a","blue"),synonyms=c("day", "the", "brown"))
voc <- data.frame(voc)
word synonyms
1 weather day
2 a the
3 blue brown
Desired output
V3 V5
1 what is the day today? what is a weather today?
2 Tom has brown eyes Tom has blue eyes
我寫了簡單的代碼,但它不起作用。
for (k in 1:nrow(question))
{
for (i in 1:nrow(voc))
{
question$V5<- gsub(do.call(rbind,strsplit(question$V3[k]," "))[which (do.call(rbind,strsplit(question$V3[k]," "))== voc[i,2])], voc[i,1], question$V3)
}
}
也許有人會試圖幫助我嗎? :)
我寫的第二個代碼,但它並沒有太多工作..
for(i in 1:nrow(questions))
{
for(j in 1:nrow(voc))
{
if (grepl(voc[j,k],do.call(rbind,strsplit(questions[i,]," "))) == TRUE)
{
new=matrix(gsub(do.call(rbind,strsplit(questions[i,]," "))[which(do.call(rbind,strsplit(questions[i,]," "))== voc[j,2])], voc[j,1], questions[i,]))
questions[i,]=new
}
}
questions = cbind(questions,c(new))
}
您的問題不太可能吸引答案,請提供一些樣本數據(涉及的數據框的前幾行),所需輸出的示例也會很好。 –
好! :)謝謝你的建議 –