2017-02-09 66 views
2

我想將長格式風數據轉換成寬格式。 Parameter.Name列中列出了風速和風向。這些值需要由Local.Site.Name和Date.Local變量轉換。用dcast改造EPA風速和風向數據R

如果每個唯一的Local.Site.Name + Date.Local行有多個觀測值,那麼我需要這些觀測值的平均值。內置參數'fun.aggregate = mean'適用於風速,但平均風向不能用這種方式計算,因爲這些值是以度爲單位的。例如,北(350,10)附近的兩個風向的平均值將輸出爲南(180)。例如:((350 + 10)/ 2 = 180),儘管極地平均值爲360或0.

「圓形」包將允許我們計算平均風向而不必執行任何三角函數,但我無法嘗試在'fun.aggregate'參數中嵌套這個附加函數。我想一個簡單的else if語句可會做的伎倆,但我遇到了以下錯誤:

Error in vaggregate(.value = value, .group = overall, .fun = fun.aggregate, : could not find function ".fun" 
In addition: Warning messages: 
1: In if (wind$Parameter.Name == "Wind Direction - Resultant") { : 
    the condition has length > 1 and only the first element will be used 
2: In if (wind$Parameter.Name == "Wind Speed - Resultant") { : 
    the condition has length > 1 and only the first element will be used  
3: In mean.default(wind$"Wind Speed - Resultant") : 
    argument is not numeric or logical: returning NA 

我們的目標是能夠使用fun.aggregate = mean風速,但mean(circular(Wind Direction, units = 'degrees')的風向。

這裏的原始數據(> 100MB): https://drive.google.com/open?id=0By6o_bZ8CGwuUUhGdk9ONTgtT0E

這裏的數據的子集(第100行): https://drive.google.com/open?id=0By6o_bZ8CGwucVZGT0pBQlFzT2M

這裏是我的腳本:

library(reshape2) 
library(dplyr) 
library(circular) 

#read in the long format data: 
wind <- read.csv("<INSERT_FILE_PATH_HERE>", header = TRUE) 

#cast into wide format: 
wind.w <- dcast(wind, 
      Local.Site.Name + Date.Local ~ Parameter.Name, 
      value.var = "Arithmetic.Mean", 
      fun.aggregate = (
       if (wind$Parameter.Name == "Wind Direction - Resultant") { 
       mean(circular(wind$"Wind Direction - Resultant", units = 'degrees')) 
       } 
       else if (wind$Parameter.Name == "Wind Speed - Resultant") { 
       mean(wind$"Wind Speed - Resultant") 
       }), 
      na.rm = TRUE) 

任何幫助將不勝感激!

-spacedSparking

編輯:這裏是解決方案:

library(reshape2) 
library(SDMTools) 
library(dplyr) 
#read in the EPA wind data: 
#This data is publicly accessible, and can be found here: https://aqsdr1.epa.gov/aqsweb/aqstmp/airdata/download_files.html  
wind <- read.csv("daily_WIND_2016.csv", sep = ',', header = TRUE, stringsAsFactors = FALSE) 

#convert long format wind speed data by date and site id: 
wind_speed <- dcast(wind, 
        Local.Site.Name + Date.Local ~ Parameter.Name, 
        value.var = "Arithmetic.Mean", 
        fun.aggregate = function(x) { 
         mean(x, na.rm=TRUE) 
        }, 
        subset = .(Parameter.Name == "Wind Speed - Resultant") 
) 

#convert long format wind direction data into wide format by date and local site id: 
wind_direction <- dcast(wind, 
         Local.Site.Name + Date.Local ~ Parameter.Name, 
         value.var = "Arithmetic.Mean", 
         fun.aggregate = function(x) { 
          if(length(x) > 0) 
          circular.averaging(x, deg = TRUE) 
          else 
          -1 
         }, 
         subset= .(Parameter.Name == "Wind Direction - Resultant") 
) 

#join the wide format split wind_speed and wind_direction dataframes 
wind.w <- merge(wind_speed, wind_direction) 
+0

您應該將數據文件的頂部剪切到前100行左右,然後在此處發佈。讓每個想回答你的問題的人下載106MB可能會減少助手的數量。 – Richard

+0

我確保將數據修剪爲100行。感謝您的建議,我是新的堆棧! – spacedSparking

+0

謝謝,這很容易處理,但是您是否已驗證此小數據集仍顯示您正在嘗試解決的問題?您的SO目標是儘可能提供可用的資源來理解和回答您的問題。 – Richard

回答

0

您可以使用子集dcast應用兩種功能,並得到再單獨dataframes

library(reshape2) 
library(dplyr) 
library(circular) 

#cast into wide format: 
wind_speed <- dcast(wind, 
       Local.Site.Name + Date.Local ~ Parameter.Name, 
       value.var = "Arithmetic.Mean", 
       fun.aggregate = function(x) { 
        mean(x, na.rm=TRUE) 
       }, 
       subset=.(Parameter.Name == "Wind Speed - Resultant") 
) 

wind_direction <- dcast(wind, 
        Local.Site.Name + Date.Local ~ Parameter.Name, 
        value.var = "Arithmetic.Mean", 
        fun.aggregate = function(x) { 
         if(length(x) > 0) 
         mean(circular(c(x), units="degrees"), na.rm=TRUE) 
         else 
         -1 
        }, 
        subset=.(Parameter.Name == "Wind Direction - Resultant") 
) 


wind.w <- merge(wind_speed, wind_direction) 
+0

這是這是一種非常優雅的長格式數據子集化方式,據說我已經意識到'circular()'函數並不像我希望的那樣聚集風向,最終我希望我的平均風向站點和日期都在0到360度的範圍內,我希望使用'openair'包中的一些函數來解決這個問題,我感謝你的迴應! – spacedSparking

0

您使用wind.w定義wind.w代碼的內部 - 這是行不通的!

您還正在使用斜角引號(`)而不是直引號(')。直引號應該用來描述一個字符串。

+0

感謝您指出'wind.w'問題。自動完成讓我陷入了傾斜的引號,謝謝。在做出這些更改後,我留下了以下錯誤: – spacedSparking

+0

'錯誤在vaggregate(.value = value,.group = overall,.fun = fun.aggregate,: 找不到函數「.fun」' – spacedSparking

0

好吧得益於合併他們所有的幫助,我設法解決這個惱人的風向問題。有時解決問題只是知道要問的正確問題。就我而言,學習術語'矢量平均'就是我所需要的!從SDMTools包中有一個內置的矢量平均函數circular.averaging(),它可以平均風向併產生一個仍然在0-359度之間的輸出!我最終做的是追加tjjjohnson的腳本。我將fun.aggregate參數從mean(circular(c(x), units = "degrees"), na.rm = TRUE)更改爲circular.averaging(x, deg = TRUE)這裏是raw and aggregated數據的直方圖!一切都很好看,謝謝大家!