2014-09-30 87 views
0

假設我有一個數據幀,看起來是這樣的:用dplyr創建一個因子變量?

df1=structure(list(Name = structure(1:6, .Label = c("N1", "N2", "N3", 
                "N4", "N5", "N6", "N7"), class = "factor"), sector = structure(c(4L, 
                                4L, 4L, 3L, 3L, 2L), .Label = c("other stuff", "Private for-profit, 4-year or above", 
                                        "Private not-for-profit, 4-year or above", "Public, 4-year or above" 
                                ), class = "factor"), flagship = c(1, 0, 0, 0, 0, 0)), .Names = c("Name", 
                                                 "sector", "flagship"), row.names = c(NA, 6L), class = "data.frame") 

我想創建一個新的因子變量,「部門」。我可以用很多代碼行做很長的一段路,但我確信有更高效的方法。

現在這是我在做什麼:

df1$PublicFlag=0 
df1$PublicFlag[df1$sector=="Public, 4-year or above" & df1$flagship==1]=1 
df1$Public=0 
df1$Public[df1$sector=="Public, 4-year or above" & df1$flagship==0]=1 
df1$PrivateNP=0 
df1$PrivateNP[df1$sector=="Private not-for-profit"]=1 
df1$Private4P=0 
df1$Private4P[df1$sector=="Private for-profit, 4-year or above"]=1 

library(reshape) 
df2 = melt(df1, id=c("Name", "sector", "flagship")) 
df2 = df2[df2$value==1,c("Name", "sector", "flagship", "variable")] 
library(plyr) 
df2 = rename(df2, c("variable"="Sector")) 

感謝您的幫助!

回答

2

這是一個古老後,但我經常絆倒它。這就是爲什麼我想給出一個最新的答案。 Version 0.5.0 of dplyr引入了很多有用的矢量函數來解決這個問題。

避免ifelse-嵌套(並因此保留了許多,許多小貓活着)與case_when():從字符(或數字)可變

df1 %>% 
    mutate(Sector = case_when(
     sector=="Public, 4-year or above" & flagship==1 ~ "PublicFlag", 
     sector=="Public, 4-year or above" & flagship==0 ~ "Public", 
     sector=="Private not-for-profit" ~ "PrivateNP", 
     sector=="Private for-profit, 4-year or above" ~ "Private4P"), 
    Sector = factor(Sector, levels=c("Public","PublicFlag","PrivateNP","Private4P")) 
) 

生成因子與recode_factor():

df1 %>% 
    mutate(Sector = recode_factor(sector, 
           "Public, 4-year or above" = "Public", 
           "Private not-for-profit" = "PrivateNP", 
           "Private for-profit, 4-year or above" = "Private4P")) 
1

你並不真的甚至需要dplyr

df1$Sector <- factor(ifelse(df1$sector=="Public, 4-year or above" & df1$flagship==1, "PublicFlag", 
         ifelse(df1$sector=="Public, 4-year or above" & df1$flagship==0, "Public", 
         ifelse(df1$sector=="Private not-for-profit", "PrivateNP", 
          ifelse(df1$sector=="Private for-profit, 4-year or above", "Private4P", NA))))) 


df1 

## Name         sector flagship  Sector 
## 1 N1     Public, 4-year or above  1 PublicFlag 
## 2 N2     Public, 4-year or above  0  Public 
## 3 N3     Public, 4-year or above  0  Public 
## 4 N4 Private not-for-profit, 4-year or above  0  <NA> 
## 5 N5 Private not-for-profit, 4-year or above  0  <NA> 
## 6 N6  Private for-profit, 4-year or above  0 Private4P 

您可以用最終可能因子水平更換NA如果你需要它

+7

有一次,你做了一個嵌套的if-else那麼深,我殺了一隻小貓。 – hadley 2014-09-30 22:31:10

2

嘗試:

df1$Sector <- with(df1, c("Private4P", NA, "Public", 
       "PublicFlag")[as.numeric(factor(1+2*as.numeric(sector)+4*flagship))]) 



subset(df1, !is.na(Sector)) 
# Name        sector flagship  Sector 
#1 N1    Public, 4-year or above  1 PublicFlag 
#2 N2    Public, 4-year or above  0  Public 
#3 N3    Public, 4-year or above  0  Public 
#6 N6 Private for-profit, 4-year or above  0 Private4P