優化：值替換在數據幀wiith多個條件

我具有類似於該樣品的數據幀：根據在兩列我要通過大小和顏色的項進行分類的信息優化：值替換在數據幀wiith多個條件

df <- structure(list(Ball = structure(c(5L, 3L, 2L, 4L, 1L, 3L), .Label = c("blue", "blue is my favourite", "red", "red ", "red ball"), class = "factor"), size = c(1.2, 2, 3, 10, 12, 100)), .Names = c("Ball", "size"), class = "data.frame", row.names = c(NA, -6L))

。輸出應該是這樣的：

structure(list(Ball = structure(c(5L, 3L, 2L, 4L, 1L, 3L), .Label = c("blue", "blue is my favourite", "red", "red ", "red ball"), class = "factor"), size = c(1.2, 2, 3, 10, 12, 100), Class = c("small red ball", "small red ball", "small blue ball", "medium red ball", "medium blue ball", "big red ball")), row.names = c(NA, -6L), .Names = c("Ball", "size", "Class"), class = "data.frame")

我已經運行的代碼，但是它很長，混亂的，我相信有一種更簡潔的方式讓我所需的輸出。

那麼我做了什麼？

我開始選擇第一類的項目和重命名選定df$Class值：

df["Class"] <- NA #add new column 

df[grepl("red", df$Ball) & df$size <10, ]$Class <- "small red ball"

因爲我grepl選擇有時是空的，我加了if (length() > 0）條件：

if (length(df[grepl("red", df$Ball) & df$size <10, ]$Class) > 0) {df[grepl("red", df$Ball) & df$size <10, ]$Class <- "small red ball"}

最後我結合我在一個循環中的所有選擇

df["Class"] <- NA #add new column 
z <- c("red", "blue") 

for (i in z){ 
    if (length(df[grepl(i, df$Ball) & df$size <10, ]$Class) > 0) {df[grepl(i, df$Ball) & df$size <10, ]$Class <- paste("small", i, "ball", sep=" ")} 
    if (length(df[grepl(i, df$Ball) & df$size >=10 & df$size <100, ]$Class) > 0) {df[grepl(i, df$Ball) & df$size >=10 & df$size <100, ]$Class <- paste("medium", i, "ball", sep=" ")} 
    if (length(df[grepl(i, df$Ball) & df$size >=100, ]$Class) > 0) {df[grepl(i, df$Ball) & df$size >=100, ]$Class <- paste("big", i, "ball", sep=" ")} 
}

它適用於兩種顏色和三種尺寸類別，但我的原始數據框要大得多。因此，（因爲它看起來非常混亂），我的問題： 我該如何簡化我的代碼？

來源

2017-12-27 Iris

我們可以使用cut創建使用str_extract

library(stringr) 
df$Class <- with(df, paste(as.character(cut(size, breaks = c(1, 9, 99, Inf), 
    labels = c('small', 'medium', 'big'))), str_extract(Ball, 'red|blue'), 'ball')) 
df$Class 
#[1] "small red ball" "small red ball" "small blue ball" 
#[4] "medium red ball" "medium blue ball" "big red ball"

來源

2017-12-27 19:15:33 akrun

我沒有看到'stringr'包的本質。我猜base r的工作原理是：'paste（as.character（cut（df $ size，c（1,10,100，Inf），c（「small」，「medium」，「large」）））， sub（「 [^（red | blue）]。*「，」「，df $ Ball），'Ball'）' – Onyambu

@Onyambu確定'sub'有效，但如果沒有匹配，那麼它可以返回整個字符串因爲'str_extract'返回NA。一個解決方法是'regexpr/regmatches' – akrun

對於small：x <10'，'medium 10 <= x <100'，'large：x>應該是'c（1，9，99，Inf） = 100'，對嗎？ – Iris

基於「大小」與「球」的提取值paste它的分組似乎是一個很大的情況下使用dplyr和stringr包：

library(stringr) 
library(dplyr) 

df <- structure(list(Ball = structure(c(5L, 3L, 2L, 4L, 1L, 3L), .Label = c("blue", "blue is my favourite", "red", "red ", "red ball"), class = "factor"), size = c(1.2, 2, 3, 10, 12, 100)), .Names = c("Ball", "size"), class = "data.frame", row.names = c(NA, -6L)) 


df %>% 
    mutate(
    color = str_extract(`Ball`, "(red)|(blue)"), 
    size_category = case_when(
     size < 10 ~ "small", 
     size >= 10 & size < 100 ~ "medium", 
     size >= 100 ~ "large" 
    ), 
    category = str_c(size_category, color, "ball", sep = " ") 
)

來源

2017-12-27 19:21:58

這個答案是非常相似@ akrun的，但你可以包括多種顏色（在這裏就是我使用colors()調色板，但ÿ你也可以使用其他的。我也稍微改變了cut函數的參數。

size<- cut(df$size, c(0, 10, 100, Inf), labels = c("small", "medium", "big"), right=F) 
colors<- str_extract(df$Ball, paste(colors(), collapse="|")) 
df$Class<- paste(size, colors, "ball", sep = " ") 

> df 
        Ball size   Class 
1    red ball 1.2 small red ball 
2     red 2.0 small red ball 
3 blue is my favourite 3.0 small blue ball 
4     red 10.0 medium red ball 
5     blue 12.0 medium blue ball 
6     red 100.0  big red ball

此外，爲了使它有點更一般的，你可以通過使用允許大寫字母：

colors<- str_extract(df$Ball, regex(paste(colors(), collapse="|"), ignore_case=T))

所以，如果df$Ball[1] = "Red ball"，使用線以上您將獲得：

colors 
#[1] "Red" "red" "blue" "red" "blue" "red" 
df$Class<- paste(size, tolower(colors), "ball", sep = " ") 
df$Class 
#[1] "small red ball" "small red ball" "small blue ball" "medium red ball" "medium blue ball" 
#[6] "big red ball"

來源

2017-12-27 22:04:00

優化：值替換在數據幀wiith多個條件

回答

相關問題