2017-04-27 70 views
3

我有data.frame看起來像這樣重塑增加兩列data.frame

timestamp value.x station value.y parameter.x value parameter.y 
1 1/1/2010 0.6  abc  188,000 AREA PLANTED 22 PROGRESS 
2 1/1/2010 0.6  abc  156.3  YIELD   NA NA 
3 1/1/2010 -10  def  188,000 AREA PLANTED 22 PROGRESS 
4 1/1/2010 -10  def  156.3  YIELD   NA NA 

而且我想用reshape,使它看起來像這樣:

timestamp value.x station AREA PLANTED YIELD PROGRESS 
1 1/1/2010 0.6  abc  188,000   156.3 22  
3 1/1/2010 -10  def  188,000   156.3 22 

我試着

reshape(data = b, varying = list(c('value.y', 'parameter.x', 'value', 'parameter.y')), 
     v.names = c('AREA PLANTED', 'YIELD', 'PROGRESS'), 
     timevar = row.names(b), 
     times = b$timestamp, direction = 'wide', idvar = b$station) 

但它說

Error in [.data.frame(data, , idvar) : undefined columns selected 

我試着改變了一下,但不管我做了什麼,它一直拋出這個錯誤。

+0

您的整形有'b $ station'(小寫's'),但數據幀的列名是'Station'(大寫'S')? – neilfws

+0

類型,固定.... –

+1

這是有點到處 - 你沒有指定'idvar = b $ station' - 你已經說過'data = b' - 你想'idvar =「station」我想。與'timevar ='相同。您也有多個值,每個站和時間戳交互不起作用。您可以通過重新設置(變換(b,時間= ave(as.character(Station),Station,FUN = seq_along)),direction =「wide」,idvar = c(「timestamp」,「Station」,「 value.x「))' – thelatemail

回答

2

這使用reshape2。我不認爲有可能在一個步驟中投射數據幀。請注意,看起來輸入是其他一些連接操作的結果(因爲某些名稱具有.x和。suffixes)。我想,加入可以改進,以避免這種併發症

df <- read.table(header=TRUE, stringsAsFactors = FALSE, text = 
"timestamp value.x station value.y parameter.x value parameter.y 
1/1/2010 0.6  abc  188,000 AREAPLANTED 22 PROGRESS 
1/1/2010 0.6  abc  156.3  YIELD   NA NA 
1/1/2010 -10  def  188,000 AREAPLANTED 22 PROGRESS 
1/1/2010 -10  def  156.3  YIELD   NA NA 
") 

library(reshape2) 

# extract the last two columns into a variable/value and make unique 
df1 <- unique(df[!is.na(df$value),c("timestamp", "value.x", "station", "parameter.y", "value")]) 
names(df1) <- c("timestamp", "value.x", "station", "variable", "value") 

# extract columns 4,5 into a variable value 
df2 <- df[,c("timestamp", "value.x", "station", "parameter.x", "value.y")] 
names(df2) <- c("timestamp", "value.x", "station", "variable", "value") 

# cast 
dcast(rbind(df1, df2), timestamp + value.x + station ~ variable, value.var = "value") 

# timestamp value.x station AREAPLANTED PROGRESS YIELD 
# 1 1/1/2010 -10.0  def  188,000  22 156.3 
# 2 1/1/2010  0.6  abc  188,000  22 156.3 
0

我@ epi99該任務需要被分解成步驟和重組達成一致。下面是做這件事的tidyverse方式,假設你的數據幀被稱爲b作爲示例代碼:

library(tidyverse) 
b = read.csv("C:\\Temp\\stack_overflow_sample_data_which_I_hacked_together_in_Excel.csv") 
df1 = b %>% select(timestamp, value.x, station, value.y, parameter.x) %>% spread(key = parameter.x, value = value.y) 
df2 = b %>% select(timestamp, value.x, station, value, parameter.y) %>% filter(!is.na(value)) %>% spread(key = parameter.y, value = value) 
df.answer = merge(df1, df2, by = c("timestamp", "value.x", "station")) 
2

仍然在基礎R,考慮dataframes 之間根據您的需要。根據您的需要,您當前的設置使用用於廣泛到長整形的參數,反之亦然。

mdf <- merge(
    reshape(b, timevar="parameter.x", 
     v.names = c("value.y"), 
     idvar = c("timestamp", "value.x", "station"), 
     direction = "wide", 
     drop = c("value", "parameter.y")), 

    reshape(b[!is.na(b$value),], timevar="parameter.y", 
     v.names = c("value"), 
     idvar = c("timestamp", "value.x", "station"), 
     direction = "wide", 
     drop = c("value.y", "parameter.x")), 
    by=c("timestamp", "value.x", "station") 
) 

names(mdf) <- gsub("(value\\.y\\.|value\\.)", "", names(mdf)) 

mdf  
# timestamp  x station AREA PLANTED YIELD PROGRESS 
# 1 1/1/2010 -10.0  def  188,000 156.3  22 
# 2 1/1/2010 0.6  abc  188,000 156.3  22