我有我試圖用GGPLOT2繪製下面的數據集,它是一個時間序列的三個實驗A1,B1和C1和每個實驗有三個重複。R:如何從平滑ggplot2中刪除異常值?
我想添加一個stat,它可以在返回更平滑(平均值和方差?)之前檢測並刪除異常值。我寫了自己的離羣值函數(未顯示),但我認爲已經有一個函數可以做到這一點,我只是沒有找到它。
我已經看了stat_sum_df(「median_hilow」,GEOM =「平滑」)從GGPLOT2書中的一些例子,但我不理解Hmisc的幫助文檔,看它是否刪除異常與否。
是否有一個函數在ggplot中刪除這樣的異常值,或者我會在下面修改我的代碼以添加我自己的函數?
編輯:我剛纔看到了這個(How to use Outlier Tests in R Code),並注意到哈德利建議使用穩健的方法,如rlm。我正在繪製細菌生長曲線,所以我不認爲線性模型是最好的,但對於其他模型或在這種情況下使用或使用健壯模型的建議將不勝感激。
library (ggplot2)
data = data.frame (day = c(1,3,5,7,1,3,5,7,1,3,5,7,1,3,5,7,1,3,5,7,1,3,5,7,1,3,5,7,1,3,5,7,1,3,5,7), od =
c(
0.1,1.0,0.5,0.7
,0.13,0.33,0.54,0.76
,0.1,0.35,0.54,0.73
,1.3,1.5,1.75,1.7
,1.3,1.3,1.0,1.6
,1.7,1.6,1.75,1.7
,2.1,2.3,2.5,2.7
,2.5,2.6,2.6,2.8
,2.3,2.5,2.8,3.8),
series_id = c(
"A1", "A1", "A1","A1",
"A1", "A1", "A1","A1",
"A1", "A1", "A1","A1",
"B1", "B1","B1", "B1",
"B1", "B1","B1", "B1",
"B1", "B1","B1", "B1",
"C1","C1", "C1", "C1",
"C1","C1", "C1", "C1",
"C1","C1", "C1", "C1"),
replicate = c(
"A1.1","A1.1","A1.1","A1.1",
"A1.2","A1.2","A1.2","A1.2",
"A1.3","A1.3","A1.3","A1.3",
"B1.1","B1.1","B1.1","B1.1",
"B1.2","B1.2","B1.2","B1.2",
"B1.3","B1.3","B1.3","B1.3",
"C1.1","C1.1","C1.1","C1.1",
"C1.2","C1.2","C1.2","C1.2",
"C1.3","C1.3","C1.3","C1.3"))
> data
day od series_id replicate
1 1 0.10 A1 A1.1
2 3 1.00 A1 A1.1
3 5 0.50 A1 A1.1
4 7 0.70 A1 A1.1
5 1 0.13 A1 A1.2
6 3 0.33 A1 A1.2
7 5 0.54 A1 A1.2
8 7 0.76 A1 A1.2
9 1 0.10 A1 A1.3
10 3 0.35 A1 A1.3
11 5 0.54 A1 A1.3
12 7 0.73 A1 A1.3
13 1 1.30 B1 B1.1
... etc...
這是我到目前爲止,並很好地工作,但異常不會被刪除:
r <- ggplot(data = data, aes(x = day, y = od))
r + geom_point(aes(group = replicate, color = series_id)) + # add points
geom_line(aes(group = replicate, color = series_id)) + # add lines
geom_smooth(aes(group = series_id)) # add smoother, average of each replicate
編輯:我只是說低於我是離羣值問題的例子顯示兩個圖表具有真實的數據而不是上面的示例數據。
第一張圖顯示系列p26s4,第32天左右在兩個重複中出現了一些非常奇怪的現象,顯示了2個異常值。
第二張圖顯示系列p22s5,在第18天,當天的閱讀有些奇怪,我想可能是機器錯誤。
目前我正在仔細觀察數據,以檢查增長曲線是否正常。在考慮了哈德利的建議並設置了家庭=「對稱」之後,我相信黃土平滑者在忽略異常值方面做得不錯。
p26s4 shows around day 32 something really weird went on in two of the replicates, showing 2 outliers http://img696.imageshack.us/img696/8743/p26s4loess.png p22s5 shows that on day 18, something weird went on with the reading that day, likely machine error I think http://img521.imageshack.us/img521/8083/p22s5loess.png
@彼得/ @哈德利,我想這樣做的下一件事就是嘗試和適合物流,姜氏或理查德的生長曲線,以這個數據來代替黃土和計算增長率在指數階段。最終我打算在R(http://cran.r-project.org/web/packages/grofit/index.html)中使用grofit包,但現在我想用ggplot2手動繪製這些圖表,如果可能的話。如果你有任何指針,那麼將非常感激。
我得到'錯誤:未知參數:family'當我嘗試這一點。 – JayCo 2016-06-21 23:55:38
想通了!正確的語法是'geom_smooth(method = loess,method.args = list(family =「symmetric」))' – JayCo 2016-06-22 00:06:19