2017-10-13 102 views
1

我有兩個CSV文件:比較兩個數據幀基於公共列

文件1:

SN CY Year Month Day Hour Lat Lon 
196101 1 1961 1 14 12 8.3 134.7 
196101 1 1961 1 14 18 8.8 133.4 
196101 1 1961 1 15 0 9.1 132.5 
196101 1 1961 1 15 6 9.3 132.2 
196101 1 1961 1 15 12 9.5 132 
196101 1 1961 1 15 18 9.9 131.8 

文件2:

Year Month Day RR Hour Lat Lon 
1961 1 14 0 0 14.0917 121.055 
1961 1 14 0 6 14.0917 121.055 
1961 1 14 0 12 14.0917 121.055 
1961 1 14 0 18 14.0917 121.055 
1961 1 15 0 0 14.0917 121.055 
1961 1 15 0 6 14.0917 121.055 

我想file2中添加另一列,並把「 TRUE「,如果file2中的行存在於file1中,則它們具有相同的Year,Month,Day和Hour,否則爲」FALSE「。然後保存爲csv文件。

所需的輸出:

Year Month Day RR Hour Lat Lon  com 
1961 1 14 0 0 14.0917 121.055 FALSE 
1961 1 14 0 6 14.0917 121.055 FALSE 
1961 1 14 0 12 14.0917 121.055 TRUE 
1961 1 14 0 18 14.0917 121.055 TRUE 
1961 1 15 0 0 14.0917 121.055 TRUE 
1961 1 15 0 6 14.0917 121.055 TRUE 

這裏是我的腳本:

jtwc <- read.csv("file1.csv",header=T,sep=",") 
stn <- read.csv("file2.csv",header=T,sep=",") 

if ((jtwc$Year == "stn$YY") & (jtwc$Month == "stn$MM") & (jtwc$Day == "stn$DD") &(jtwc$Hour == "stn$HH")){ 
stn$com <- "TRUE" 
} else { 
stn$com <- "FALSE" 
} 
write.csv(stn,file="test.csv",row.names=T) 

這給出了一個錯誤:

In if ((jtwc$Year == "stn$YY") & (jtwc$Month == "stn$MM") & (jtwc$Day == :the condition has length > 1 and only the first element will be used 
+0

是一個可重現的例子。像帖子結果頭(dput(YOURDATA)) –

回答

1

使用data.table快速和骯髒的解決方案:

  1. 使用fread來讀取文件。
  2. 提取想要的列從file1(因爲你只在file2興趣)
  3. 使用merge
  4. 如果有合併的文件沒有從file1比賽添加FALSE

代碼:

library(data.table) 
result <- merge(fread("file2.csv"), 
       fread("file1.csv")[, .(Year, Month, Day, Hour, com = TRUE)], 
       all.x = TRUE)[is.na(com), com := FALSE] 

result 
    Year Month Day Hour RR  Lat  Lon com 
1: 1961  1 14 0 0 14.0917 121.055 FALSE 
2: 1961  1 14 6 0 14.0917 121.055 FALSE 
3: 1961  1 14 12 0 14.0917 121.055 TRUE 
4: 1961  1 14 18 0 14.0917 121.055 TRUE 
5: 1961  1 15 0 0 14.0917 121.055 TRUE 
6: 1961  1 15 6 0 14.0917 121.055 TRUE 
+0

非常感謝! – Lyndz

3

您也可以使用dplyr/tidyverse:

library(tidyverse) 
d2 %>% 
    left_join(select(d1, Year, Month, Day, Hour, Com=Lon)) %>% 
    mutate(Com=ifelse(is.na(Com), FALSE, TRUE)) 

Joining, by = c("Year", "Month", "Day", "Hour") 
    Year Month Day RR Hour  Lat  Lon Com 
1 1961  1 14 0 0 14.0917 121.055 FALSE 
2 1961  1 14 0 6 14.0917 121.055 FALSE 
3 1961  1 14 0 12 14.0917 121.055 TRUE 
4 1961  1 14 0 18 14.0917 121.055 TRUE 
5 1961  1 15 0 0 14.0917 121.055 TRUE 
6 1961  1 15 0 6 14.0917 121.055 TRUE 
+0

非常感謝您的幫助!這也有效。 – Lyndz