我想將長格式風數據轉換成寬格式。 Parameter.Name列中列出了風速和風向。這些值需要由Local.Site.Name和Date.Local變量轉換。用dcast改造EPA風速和風向數據R
如果每個唯一的Local.Site.Name + Date.Local行有多個觀測值,那麼我需要這些觀測值的平均值。內置參數'fun.aggregate = mean'適用於風速,但平均風向不能用這種方式計算,因爲這些值是以度爲單位的。例如,北(350,10)附近的兩個風向的平均值將輸出爲南(180)。例如:((350 + 10)/ 2 = 180),儘管極地平均值爲360或0.
「圓形」包將允許我們計算平均風向而不必執行任何三角函數,但我無法嘗試在'fun.aggregate'參數中嵌套這個附加函數。我想一個簡單的else if語句可會做的伎倆,但我遇到了以下錯誤:
Error in vaggregate(.value = value, .group = overall, .fun = fun.aggregate, : could not find function ".fun"
In addition: Warning messages:
1: In if (wind$Parameter.Name == "Wind Direction - Resultant") { :
the condition has length > 1 and only the first element will be used
2: In if (wind$Parameter.Name == "Wind Speed - Resultant") { :
the condition has length > 1 and only the first element will be used
3: In mean.default(wind$"Wind Speed - Resultant") :
argument is not numeric or logical: returning NA
我們的目標是能夠使用fun.aggregate = mean
風速,但mean(circular(Wind Direction, units = 'degrees')
的風向。
這裏的原始數據(> 100MB): https://drive.google.com/open?id=0By6o_bZ8CGwuUUhGdk9ONTgtT0E
這裏的數據的子集(第100行): https://drive.google.com/open?id=0By6o_bZ8CGwucVZGT0pBQlFzT2M
這裏是我的腳本:
library(reshape2)
library(dplyr)
library(circular)
#read in the long format data:
wind <- read.csv("<INSERT_FILE_PATH_HERE>", header = TRUE)
#cast into wide format:
wind.w <- dcast(wind,
Local.Site.Name + Date.Local ~ Parameter.Name,
value.var = "Arithmetic.Mean",
fun.aggregate = (
if (wind$Parameter.Name == "Wind Direction - Resultant") {
mean(circular(wind$"Wind Direction - Resultant", units = 'degrees'))
}
else if (wind$Parameter.Name == "Wind Speed - Resultant") {
mean(wind$"Wind Speed - Resultant")
}),
na.rm = TRUE)
任何幫助將不勝感激!
-spacedSparking
編輯:這裏是解決方案:
library(reshape2)
library(SDMTools)
library(dplyr)
#read in the EPA wind data:
#This data is publicly accessible, and can be found here: https://aqsdr1.epa.gov/aqsweb/aqstmp/airdata/download_files.html
wind <- read.csv("daily_WIND_2016.csv", sep = ',', header = TRUE, stringsAsFactors = FALSE)
#convert long format wind speed data by date and site id:
wind_speed <- dcast(wind,
Local.Site.Name + Date.Local ~ Parameter.Name,
value.var = "Arithmetic.Mean",
fun.aggregate = function(x) {
mean(x, na.rm=TRUE)
},
subset = .(Parameter.Name == "Wind Speed - Resultant")
)
#convert long format wind direction data into wide format by date and local site id:
wind_direction <- dcast(wind,
Local.Site.Name + Date.Local ~ Parameter.Name,
value.var = "Arithmetic.Mean",
fun.aggregate = function(x) {
if(length(x) > 0)
circular.averaging(x, deg = TRUE)
else
-1
},
subset= .(Parameter.Name == "Wind Direction - Resultant")
)
#join the wide format split wind_speed and wind_direction dataframes
wind.w <- merge(wind_speed, wind_direction)
您應該將數據文件的頂部剪切到前100行左右,然後在此處發佈。讓每個想回答你的問題的人下載106MB可能會減少助手的數量。 – Richard
我確保將數據修剪爲100行。感謝您的建議,我是新的堆棧! – spacedSparking
謝謝,這很容易處理,但是您是否已驗證此小數據集仍顯示您正在嘗試解決的問題?您的SO目標是儘可能提供可用的資源來理解和回答您的問題。 – Richard