2017-01-10 67 views
0

我有兩個數據幀,其中一個產品名稱爲&另一個有類別..現在我需要將該類別與產品名稱&相匹配的類別分配給名稱,如果該字符串匹配。獲取匹配的字符串?

所以第一含產品名稱(Product_Name.csv)數據幀是:

  **Product.Name** 
     Black Printed Blouse 
Silver Embellished Crop Top 
    Maroon Solid Strappy Top 

其它數據幀contaning的類別(Category.csv)爲:

**Category** 
    Strappy 
     Blouse 
     Crop 

最終輸出應是:

 Black Printed Blouse  Blouse 
Silver Embellished Crop Top   Crop 
    Maroon Solid Strappy Top  Strappy 

現在,我正在使用grepl,它給出了真或假

product <- read.csv("Product_Name.csv", header = T, sep = ",") 
category <- read.csv("Category.csv", header = T, sep = ",") 


for (i in 1:nrow(product)){ 

product[i, 2] <- grepl(Category$Category[1], product$Product.Name[i], ignore.case = TRUE) 
product[i, 3] <- grepl(Category$Category[2], product$Product.Name[i], ignore.case = TRUE) 
product[i, 4] <- grepl(Category$Category[3], product$Product.Name[i], ignore.case = TRUE) 


} 
+1

[字符串匹配不同尺寸的data.frames](的可能的複製HTTP ://stackoverflow.com/questions/39677987/string-matching-to-data-frames-of-different-sizes) – Aramis7d

回答

1

我們可以使用str_extract

library(stringr) 
product$Category <- str_extract(product$Product.Name, paste(category$Category, collapse="|")) 
product 
#     Product.Name Category 
#1  Black Printed Blouse Blouse 
#2 Silver Embellished Crop Top  Crop 
#3 Maroon Solid Strappy Top Strappy 
0

使用基 - R的

indices = sapply(category$Category, function(x) which(grepl(x, product$Product.Name))) 

product$new_col = 1:nrow(product) 
product$new_col[indices] = names(indices) 
#> df 
#   X..Product.Name.. new_col 
#1  Black Printed Blouse Blouse 
#2 Silver Embellished Crop Top Crop 
#3 Maroon Solid Strappy Top Strappy 

# incase of any no-match cases(which we need to handle well) 
# below code manages both well (a generalised version) 

category$Category[2] = "Bloiuse" 

indices = sapply(category$Category, function(x) which(grepl(x, product$Product.Name))) 
indices.loc <- as.numeric(indices) 
indices.name <- names(indices) 

product$new_col[indices.loc[!is.na(indices.loc)]] = indices.name[!is.na(indices.loc)] 

#> product 
#     Product.Name new_col 
#1  Black Printed Blouse <NA> 
#2 Silver Embellished Crop Top Crop 
#3 Maroon Solid Strappy Top Strappy 
+0

你能分享你對這個答案的反饋。請欣賞努力寫出答案,而不是忽略它。如果沒有,請幫我改進這個答案;好好回答你的問題。也請訪問http://stackoverflow.com/help/someone-answers並採取積極的方式。謝謝! :) –