2016-12-16 51 views
1

我有一個有趣的連接,我需要在R中完成。所以這裏是我的兩個表格。R:加入表格其中t1.key1 = t2.key1和t2.key2部分匹配t2.key2

表數1:

Date       Name 
2016-01-02 10:18:00   CARDOSO, RAMON 
2016-01-02 15:02:00   HARRISON, KATHYANNE M 
2016-01-02 15:02:00   PALEO, SHERI 
2016-01-03 02:09:00   PHANOR, RENALDY 
2016-01-03 09:42:00   GUAMAN, ANGEL 
2016-01-03 18:47:00   AIME, MADELINE 
2016-01-03 18:47:00   CADET, GARDY 
2016-01-03 19:31:00   REID, ARTHUR D 
2016-01-03 22:11:00   HERNANDEZ-JONES, FREDRICK JOSHUA 
2016-01-04 12:32:00   AGUERO, RAUL 

表號2:

Date      ID     Name 
2016-01-02 10:18:00  16-22-AR   CARDOSO, RAMON 
2016-01-02 15:02:00  16-24-AR   HARRISON, KATHYANNE M", " PALEO, SHERI" 
2016-01-02 15:02:00  16-25-AR   HARRISON, KATHYANNE M", " PALEO, SHERI" 
2016-01-03 02:09:00  16-31-AR   PHANOR, RENALDY 
2016-01-03 09:42:00  16-32-AR   GUAMAN, ANGEL 
2016-01-03 18:47:00  16-39-AR   AIME, MADELINE", " CADET, GARDY" 
2016-01-03 18:47:00  16-40-AR   AIME, MADELINE", " CADET, GARDY" 
2016-01-03 19:31:00  16-42-AR   REID, ARTHUR D 
2016-01-03 22:11:00  16-44-AR   HERNANDEZ-JONES, FREDRICK JOSHUA 
2016-01-04 12:32:00  16-49-AR   AGUERO, RAUL 

我的目標是在表1中ID爲自己的列,但爲了做到這一點,我需要做一個加入日期和莫名其妙地做名稱匹配,我們從表1表2

UPDATE查找名稱:

原始數據集看起來像這樣

2016-01-02 10:18:00 16-22-AR    CARDOSO, RAMON 
2016-01-02 15:02:00 16-24-AR, 16-25-AR  HARRISON, KATHYANNE M", " PALEO, SHERI" 
2016-01-03 02:09:00 16-31-AR    PHANOR, RENALDY 
2016-01-03 09:42:00 16-32-AR    GUAMAN, ANGEL 
2016-01-03 18:47:00 16-39-AR, 16-40-AR  AIME, MADELINE", " CADET, GARDY" 
2016-01-03 19:31:00 16-42-AR    REID, ARTHUR D 
2016-01-03 22:11:00 16-44-AR    HERNANDEZ-JONES, FREDRICK JOSHUA 
2016-01-04 12:32:00 16-49-AR    AGUERO, RAUL 

的目標是有其自己的行與其相應的ID每個名字。這些ID與名稱的順序相同,第一個ID與第一個名稱一致。

希望這個澄清有所幫助。

+1

使用碼塊格式化'{}'爲表。不要使用片段。 – MYGz

+0

您可以[將名稱列](http://stackoverflow.com/q/7069076/5977215)拆分爲多個列,然後[將表格融化爲長格式](http://stackoverflow.com/q/2185252/5977215)並加入'Date'和'Name' – SymbolixAU

+0

@ joel.wilson,我不確定你的意思是混淆? – Jomisilfe

回答

2

我認爲當你有像

2016-01-02 15:02:00  16-24-AR   HARRISON, KATHYANNE M", " PALEO, SHERI" 
2016-01-02 15:02:00  16-25-AR   HARRISON, KATHYANNE M", " PALEO, SHERI" 

第一個ID對應的名字,第二個ID對應於第二名稱。然後一種方法是創建一個具有正確名稱的新列。

d$order <- unlist(sapply(rle(paste0(d$Date, d$Name))$lengths, seq_len)) 

split_names <- function(name, order = 1) { 
    names <- strsplit(name, '\\", \\"')[[1]] # Split 
    names <- gsub('^\\s|\\"', "", names) # Clean up leading space and trailing " 
    names[order] 
} 

d$Newname <- mapply(split_names, d$Name, d$order) 
d[, c("Date", "ID", "Newname")] 
#     Date  ID       Newname 
# 1 2016-01-02 10:18:00 16-22-AR     CARDOSO, RAMON 
# 2 2016-01-02 15:02:00 16-24-AR   HARRISON, KATHYANNE M 
# 3 2016-01-02 15:02:00 16-25-AR      PALEO, SHERI 
# 4 2016-01-03 02:09:00 16-31-AR     PHANOR, RENALDY 
# 5 2016-01-03 09:42:00 16-32-AR     GUAMAN, ANGEL 
# 6 2016-01-03 18:47:00 16-39-AR     AIME, MADELINE 
# 7 2016-01-03 18:47:00 16-40-AR      CADET, GARDY 
# 8 2016-01-03 19:31:00 16-42-AR     REID, ARTHUR D 
# 9 2016-01-03 22:11:00 16-44-AR HERNANDEZ-JONES, FREDRICK JOSHUA 
# 10 2016-01-04 12:32:00 16-49-AR      AGUERO, RAUL 

數據:

structure(list(Date = c("2016-01-02 10:18:00", "2016-01-02 15:02:00", 
"2016-01-02 15:02:00", "2016-01-03 02:09:00", "2016-01-03 09:42:00", 
"2016-01-03 18:47:00", "2016-01-03 18:47:00", "2016-01-03 19:31:00", 
"2016-01-03 22:11:00", "2016-01-04 12:32:00"), ID = c("16-22-AR", 
"16-24-AR", "16-25-AR", "16-31-AR", "16-32-AR", "16-39-AR", "16-40-AR", 
"16-42-AR", "16-44-AR", "16-49-AR"), Name = c("CARDOSO, RAMON", 
"HARRISON, KATHYANNE M\", \" PALEO, SHERI\"", "HARRISON, KATHYANNE M\", \" PALEO, SHERI\"", 
"PHANOR, RENALDY", "GUAMAN, ANGEL", "AIME, MADELINE\", \" CADET, GARDY\"", 
"AIME, MADELINE\", \" CADET, GARDY\"", "REID, ARTHUR D", "HERNANDEZ-JONES, FREDRICK JOSHUA", 
"AGUERO, RAUL")), .Names = c("Date", "ID", "Name"), row.names = c(NA, 
-10L), class = "data.frame") 
+0

我在理解排序的工作原理和實現這個解決方案的過程中遇到了一些困難。 – Jomisilfe

相關問題