在兩個數據集中合併基於多於1列的數據集

我試圖合併兩個數據集，按年份和國家/地區。第一組數據（df = GNIPC）代表1980-2008年間每個國家的人均國民收入。在兩個數據集中合併基於多於1列的數據集

  Country Year GNIpc 
     (chr) (dbl) (dbl) 
1 Afghanistan 1990 NA 
2 Afghanistan 1991 NA 
3 Afghanistan 1992 2010 
4 Afghanistan 1993 NA 
5 Afghanistan 1994 12550 
6 Afghanistan 1995 NA

第二個數據集（DF =制裁）代表從1946年到今天的經濟制裁。

 country imposition sanctiontype sanctions_period 
     (chr)  (dbl)  (chr)   (chr) 
1 Afghanistan  1  1 6 8   1997-2001 
2 Afghanistan  1  7    1979-1979 
3 Afghanistan  1  4 7    1995-2002 
4 Albania   1  2 8    2005-2005 
5 Albania   1  7    2005-2006 
6 Albania   1  8    2004-2005

我想合併這兩個數據集，這樣，每一年GNI我要麼必須在該國存在與否的制裁。對於GNI年來不在sanctions_period值是0，對於那些這將是1.這就是我想要它看起來像：

  Country Year GNIpc Imposition sanctiontype 
      (chr) (dbl) (dbl) (dbl)  (chr) 
1 Afghanistan 1990 NA 0   NA 
2 Afghanistan 1991 NA 0   NA 
3 Afghanistan 1992 2010 0   NA 
4 Afghanistan 1993 NA 0   NA 
5 Afghanistan 1994 12550 0   NA 
6 Afghanistan 1995 NA 1   4 7

來源

2016-08-17 MB92

我不會用那種格式的第二個數據集。如果有人向我提供這些數據，我會（1）畏縮，（2）開始工作，以便每個'sanctiontype'組合和每個'sanctions_period'組合中都有一行。所以'Afganistan'將有五行，其中'sanctiontype = 1'，每個年份爲1997 - 2001年。 – joran

阿富汗1998年應該是什麼樣子？每個制裁週期（2）都是一行，還是一行「1 4 6 7 8」？ – Chris

我已經完成了一個不同的數據集，其中每個制裁類型都有自己的行。在這裏，我正在尋找一種方法來確定每個GNI年份當年是否存在制裁。回顧過去的制裁類型，我該怎麼做？ – MB92

一些示例數據：

df1 <- data.frame(country = c('Afghanistan', 'Turkey'), 
        imposition = c(1, 0), 
        sanctiontype = c('1 6 8', '4'), 
        sanctions_period = c('1997-2001', '2003-ongoing') 
) 

     country imposition sanctiontype sanctions_period 
1 Afghanistan   1  1 6 8  1997-2001 
2  Turkey   0   4  2012-ongoing

的「sanctions_period」列可以與dplyr和tidyr轉化：

library(tidyr) 
library(dplyr) 

df.new <- separate(df1, sanctions_period, c('start', 'end'), remove = F) %>% 
    mutate(end = ifelse(end == 'ongoing', '2016', end)) %>% 
    mutate(start = as.numeric(start), end = as.numeric(end)) %>% 
    group_by(country, sanctions_period) %>% 
    do(data.frame(country = .$country, imposition = .$imposition, sanctiontype = .$sanctiontype, year = .$start:.$end)) 

    sanctions_period  country imposition sanctiontype year 
      <fctr>  <fctr>  <dbl>  <fctr> <int> 
1   1997-2001 Afghanistan   1  1 6 8 1997 
2   1997-2001 Afghanistan   1  1 6 8 1998 
3   1997-2001 Afghanistan   1  1 6 8 1999 
4   1997-2001 Afghanistan   1  1 6 8 2000 
5   1997-2001 Afghanistan   1  1 6 8 2001 
6  2012-ongoing  Turkey   0   4 2012 
7  2012-ongoing  Turkey   0   4 2013 
8  2012-ongoing  Turkey   0   4 2014 
9  2012-ongoing  Turkey   0   4 2015 
10  2012-ongoing  Turkey   0   4 2016

從那裏，它應該很容易與您的第一個數據幀合併。請注意，您的第一個數據框大寫了國家和年份，而第二個數據框沒有。

df.merged <- merge(df.first, df.new, by.x = c('Country', 'Year'), by.y = c('country', 'year'))

來源

2016-08-17 20:35:19 jdobres

我在我的數據集上做了以下操作，但出現錯誤： 'df.new < - separate（sanctions4，sanctions_period，c（'start'，'end'），remove = F）％>％ mutate（start = as.numeric（start），end = as.numeric（end））％>％ group_by（country，sanctions_period）％>％ do（data.frame（country =。$ country，imposition =。$ imposition，sanctiontype = $ sanctiontype，year =。$ start：。$ end））' '錯誤。$ start：。$ end：NA/NaN參數' – MB92

難道是因爲對於某些觀察'sanction_period'是例如1990年 - 正在進行，因此當我分開列並將結束（年份）轉換爲數字時，我得到的NA的觀測值有結束年份。因此，對於某些觀測而言，沒有結束的一年，那麼爲了運行以下命令，需要R嗎？ – MB92

是的，這是正確的。我修改了示例數據和解決方案來說明結束sanctions_period年份「正在進行中」的行。 – jdobres

使用dplyr：

left_join(GNIPC, sanctions, by=c("Country"="country", "Year"="Year")) %>% 
    select(Country,Year, GNIpc, Imposition, sanctiontype)

來源

2016-08-17 19:41:17

謝謝。然而，在第二個數據框中，我沒有年份變量，而是一個範圍sanctions_period – MB92

正如'joran'在評論中指出的那樣，您需要整理數據。那就是：阿富汗1 1 6 8 1997-2001 –

對不起：正如'joran'在評論中指出的，你需要整理你的數據。即：「阿富汗1 1 6 8 1997-2001」需要變成15行，每個「制裁類型」和「年份」在範圍內各一個。 –

在兩個數據集中合併基於多於1列的數據集

回答

相關問題