2017-08-16 57 views
1

我試圖做一個熱碼,下面的字符數據框的R.傳播一個字符串的多個列中的R

x1 <- c('') 
x2 <- c('A1,A2') 
x3 <- c('A2,A3,A4') 
test <- as.data.frame(rbind(x1,x2,x3)) 

我試圖把數據格式:

x1 <- c(0,0,0,0) 
x2 <- c(1,1,0,0) 
x3 <- c(0,1,1,1) 
result <- as.data.frame(rbind(x1,x2,x3)) 
names(result) = c('A1','A2','A3','A4') 

所使用的分隔符是逗號,我可以使用的逗號分割:

test$V1 = as.character(test$V1) 
split_list = strsplit(test$V1, ",") 

這讓我列出了卡恩的列表不會被直接轉化爲數據框。有沒有更好的方式來做到這一點。我正在嘗試「https://www.rdocumentation.org/packages/CatEncoders/versions/0.1.0/topics/OneHotEncoder.fit」。在這種情況下,包裝是根據需要散佈單個色譜柱而不是多個色譜柱。

+0

'試驗%>%tibble :: rownames_to_column()%> %tidyr :: separate_rows(V1)%>%table()'可以讓你幾乎在那裏,也許比這裏的答案簡單。 – Axeman

回答

1

自定義函數來傳播唯一字符串值到列:

x1 <- c('') 
x2 <- c('A1,A2') 
x3 <- c('A2,A3,A4') 
test <- data.frame(col1=rbind(x1,x2,x3), stringsAsFactors = F) # test$col1 is a character column 

cast_variables <- function(df, variable){ 
    df[df==""] <- "missing" #handling missingness 
    x <- as.character(unique(df[[variable]])) 
    x <- gsub(" ", "", toString(x)) #so it can split on strings like "A1,A2" and "A1, A2" 
    x <- unlist(strsplit(x, ",")) 
    x <- as.character(x) 
    new_columns <- unique(sort(x))[-grep("missing", unique(sort(x)))] 
    for (i in seq_along(new_columns)){ 
    df$temp <- NA 
    df$temp <- ifelse(grepl(new_columns[i], df[[variable]]), 1, 0) 
    colnames(df)[colnames(df) == "temp"] <- new_columns[i] 
    } 
    return(df) 
} 

test <- cast_variables(test, "col1") 
print(test) 
#  col1 A1 A2 A3 A4 
#x1 missing 0 0 0 0 
#x2 A1,A2 1 1 0 0 
#x3 A2,A3,A4 0 1 1 1 
0

這裏是使用管的方法:

library(dplyr) 
library(tidyr) 
library(reshape2) 
library(data.table) 

test$V1 %>% 
    strsplit(., ",") %>% 
    setNames(row.names(test)) %>% 
    melt(value.name = 'variable') %>% 
    mutate(dummy = 1) %>% 
    spread(key = variable, value = dummy) %>% 
    list(data.frame(L1 = rownames(test)[!rownames(test) %in% .[['L1']]]), .) %>% 
    rbindlist(., use.names = T, fill = T) %>% 
    mutate_all(funs(replace(., is.na(.), 0))) 

# L1 A1 A2 A3 A4 
# 1 x1 0 0 0 0 
# 2 x2 1 1 0 0 
# 3 x3 0 1 1 1 
相關問題