R數據框結合基於列的行

我有以下數據框與名稱和城市，我想處理的定義，我不知道如何解釋它，所以我包括輸入和輸出下面的表。R數據框結合基於列的行

輸入：

+---------+-----------+------------+---------------+ 
| Varname | Component | names | cities  | 
+---------+-----------+------------+---------------+ 
| A  | B   | Jack,Bruce | New york  | 
| B  |   | Cathy  | Boston,Miami | 
| C  |   | Bob  | New york  | 
| D  | C   | Dick,Nancy | Austin,Dallas | 
| E  | A,C  |   |    | 
| F  |   | Mandy  | Manchester | 
+---------+-----------+------------+---------------+

輸出：

+---------+-----------+----------------------+------------------------+ 
| Varname | Component |  names   |   cities   | 
+---------+-----------+----------------------+------------------------+ 
| A  |   | Jack,Bruce,Cathy  | New york,Boston,Miami | 
| B  |   | Cathy    | Boston,Miami   | 
| C  |   | Bob     | New york    | 
| D  |   | Dick,Nancy,Bob  | Austin,Dallas,New york | 
| E  |   | Jack,Bruce,Cathy,Bob | New york,Boston,Miami | 
| F  |   | Mandy    | Manchester    | 
+---------+-----------+----------------------+------------------------+

正如你希望看到的，我想取組件列，並在該列各那些Varnames的，擡頭的名字和城市（實際上我有更多的專欄）並將它們結合起來，這樣我就有了一張完整的表格。這可能嗎？我不知道從哪裏開始。我的表格不是很大，所以可以應用for（）{}語句。

- >編輯，我可能沒有給出一個正確的例子，我已經用我的數據更加一致的東西替換輸入。

dput（輸入的）

結構（列表（VARNAME =結構（1：6，.Label = C（「A」，「B」，「C」，「d」，「 E，F），class =「factor」），Component = structure（c（3L，1L， 1L，4L，2L，1L），.Label = c（「」，「A，C」，「名稱=結構（c（5L，3L，2L，4L，1L，6L），。標籤= c（「」，「Bob」，「Cathy 「，」Dick，Nancy「，」Jack，Bruce「，」Mandy「），class =」factor「）， cities = structure（c（5L，3L，5L，2L，1L，4L），.Label = c （「」，「Austin，Dallas」，「Boston，Miami」，「Manchester」，「New York」），class =「factor」）），.Names = c（「Varname」，「Component」，「names」，「cities」），class =「data.frame」，row.names = c（NA，-6L ））

來源

2016-08-03 tafelplankje

提供對數據的一個例子與'dput'因此它可以我包括一個dput（） – Warner

！這是正確的，但並不能真正幫助我解決我的現實生活中的問題。我在上面包含了一個更復雜的例子。我會玩弄你的代碼，看看它是否可以幫助我進一步。 – tafelplankje

不是R代碼中最有吸引力的部分（但絕對不是最有效的），但它完成了工作。希望別人能改進它。

starting_df <- read.table(text="Varname|Component|names|cities  
A||Jack,Bruce|New york 
B||Cathy|Boston,Miami 
C|A|Bob|New york 
D|C|Dick,Nancy|Austin,Dallas",header=T, sep="|", stringsAsFactors=F) 

##Grab all the rows whose Component values are in the Varname column and vice-versa 
intermediate_df <- starting_df[(starting_df$Varname %in% starting_df$Component | starting_df$Component %in% starting_df$Varname),] 

##Change the rows in the names and cities columns to match your desired output (sorry about the for loop) 
for (x in 1:nrow(intermediate_df)) { 
    if (x == 1) { 
    intermediate_df[x,'names'] <- intermediate_df$names[x] 
    intermediate_df[x,'cities'] <- intermediate_df$cities[x] 
    } else { 
    intermediate_df[x,'names'] <- paste0(unique(unlist(strsplit(paste(intermediate_df$names[x-1],intermediate_df$names[x],sep = ","),split=","))),collapse=",") 
    intermediate_df[x,'cities'] <- paste0(unique(unlist(strsplit(paste(intermediate_df$cities[x-1],intermediate_df$cities[x],sep = ","),split=","))),collapse=",") 
    } 
} 

##Binding the new dataset with the starting dataset (but only Varnames that are in the new dataset) 
final_df <- rbind(intermediate_df,starting_df[!(starting_df$Varname %in% intermediate_df$Varname),]) 

##Order by the Varname column to get the desired output 
final_df <- final_df[order(final_df$Varname),]

所需輸出：

Varname Component names      cities     
A     Jack,Bruce    New york    
B     Cathy      Boston,Miami   
C  A   Jack,Bruce,Bob   New york    
D  C   Jack,Bruce,Bob,Dick,Nancy New york,Austin,Dallas

編輯新的數據集：

這其中使用相當全面的for loops（東西我不喜歡R中都做），但它似乎產生了一些東西：

##Setting up the new dataset 
starting_df1 <- structure(list(Varname = structure(1:6, .Label = c("A", "B", "C", "D", "E", "F"), class = "factor"), 
           Component = structure(c(3L, 1L, 1L, 4L, 2L, 1L), .Label = c("", "A,C", "B", "C"), class = "factor"), 
           names = structure(c(5L, 3L, 2L, 4L, 1L, 6L), .Label = c("", "Bob", "Cathy", "Dick,Nancy", "Jack,Bruce", "Mandy"), class = "factor"), 
           cities = structure(c(5L, 3L, 5L, 2L, 1L, 4L), .Label = c("", "Austin,Dallas", "Boston,Miami", "Manchester", "New york"), class = "factor")), 
         .Names = c("Varname", "Component", "names", "cities"), class = "data.frame", row.names = c(NA, -6L)) 

##Change the fields from factor variables to characters (so that you can use them for concatenating) 
starting_df1 <- data.frame(apply(starting_df1, 2, FUN = function(x) { 
    as.character(x) 
}), stringsAsFactors = F) 

##Nested for loops: For every row that has a value for the Component column, find its matches (and their indices) in the Varname column 
##Then for the combination of indices to change the values you wish to change through concatenation operations for both the names and cities columns 
for (i in which(!nchar(starting_df1$Component)==0)) { 
    holder <- which(grepl(paste0(unlist(strsplit(starting_df1$Component[i],split=",")),collapse="|"),starting_df1$Varname)) 
    for (j in holder) { 
    if (nchar(starting_df1$names[i])!=0) { 
     starting_df1[i, "names"] <- paste0(unique(unlist(strsplit(paste(starting_df1$names[i],starting_df1$names[j],sep = ","),split=","))),collapse=",") 
     starting_df1[i, "cities"] <- paste0(unique(unlist(strsplit(paste(starting_df1$cities[i],starting_df1$cities[j],sep = ","),split=","))),collapse=",") 
    } else { 
     starting_df1[i, "names"] <- starting_df1$names[j] 
     starting_df1[i, "cities"] <- starting_df1$cities[j] 
    } 
    } 
} 

print(starting_df1, row.names = F, right = F)

所需輸出：

Varname Component names    cities     
A  B   Jack,Bruce,Cathy  New york,Boston,Miami 
B     Cathy    Boston,Miami   
C     Bob     New york    
D  C   Dick,Nancy,Bob  Austin,Dallas,New york 
E  A,C  Jack,Bruce,Cathy,Bob New york,Boston,Miami 
F     Mandy    Manchester

來源

2016-08-03 18:47:59 Abdou

感謝複製 – tafelplankje

R數據框結合基於列的行

回答

編輯新的數據集：

所需輸出：

相關問題