從一個載體中整理數據

我使用R來分析來自多家醫院的有關抗生素使用的數據。從一個載體中整理數據

根據整齊的數據原則，我已經將這些數據導入到一個框架中。

>head(data) 
     date antibiotic usage hospital 
1 2006-01-01 amikacin 0.000000 hospital1 
2 2006-02-01 amikacin 0.000000 hospital1 
3 2006-03-01 amikacin 0.000000 hospital1 
4 2006-04-01 amikacin 0.000000 hospital1 
5 2006-05-01 amikacin 0.937119 hospital1 
6 2006-06-01 amikacin 1.002961 hospital1

（該數據集是月度數據×5家醫院×40種抗生素）

我想要做的第一件事就是聚集到抗生素類。

> head(distinct(select(data, antibiotic))) 
       antibiotic 
1     amikacin 
2 amoxicillin-clavulanate 
3    amoxycillin 
4    ampicillin 
5    azithromycin 
6   benzylpenicillin 
7    cefalotin 
8    cefazolin 

> penicillins <- c("amoxicillin-clavulanate", "amoxycillin", "ampicillin", "benzylpenicillin") 
> ceph1 <- c("cefalotin", "cefazolin")

我會想做什麼，然後根據這些抗生素類矢量子集數據：

filter(data, antibiotic =(any one of the values in the vector "penicillins")

感謝thelatemail您指出要做到這一點的方法是：

d <- filter(data, antibiotic %in% penicillins)

我想要做的數據分析的方式有很多種：

鍵分析（和ggplot輸出）爲：

X =日期

Y =由（藥物分層抗生素（S）的使用|類），醫院過濾

我不清楚現在是如何聚合這種事情的數據。

例子：
我想分析區內所有醫院使用類「ceph1」，致使（道歉 - 我知道這是不正確的代碼）

x   y 
Jan-2006 for all in hospitals(usage of cephazolin + usage of cephalotin) 
Feb-2006 for all in hospitals(usage of cephazolin + usage of cephalotin) 
etc

而且，從長遠來看，能夠將論據傳遞給一個功能，使我能夠選擇哪些醫院和哪種抗生素或哪類抗生素。

再次感謝 - 我知道這比原始問題複雜得多！

來源

2016-03-15 Trent

'％in％'可能是您正在尋找'％penicillins'的抗生素％的例子。 – thelatemail

的確如此 - 這太棒了，謝謝！ – Trent

因此，在經歷了大量的反覆試驗和閱讀堆積之後，我設法將其整理出來。

>str(data) 
'data.frame': 23360 obs. of 4 variables: 
$ date  : Date, format: "2007-09-01" "2012-06-01" ... 
$ antibiotic: Factor w/ 41 levels "amikacin","amoxicillin-clavulanate",..: 17 3 19 30 38 20 20 20 7 25 ... 
$ usage  : num 21.368 36.458 7.226 3.671 0.917 ... 
$ hospital : Factor w/ 5 levels "hospital1","hospital2",..: 1 3 2 1 4 1 4 3 5 1 ...

這樣我就可以第一子集中的數據：

>library(dplyr) 
>penicillins <- c("amoxicillin-clavulanate", "amoxycillin", "ampicillin", "benzylpenicillin") 
>d <- filter(data, antibiotic %in% penicillins)

，然後用更dplyr的做出總結

>d1 <- summarise(group_by(d, date), total = sum(usage)) 
>d1  
Source: local data frame [122 x 2] 

     date total 
     (date) (dbl) 
1 2006-01-01 1669.177 
2 2006-02-01 1901.749 
3 2006-03-01 2311.008 
4 2006-04-01 1921.436 
5 2006-05-01 1594.781 
6 2006-06-01 2150.997 
7 2006-07-01 2052.517 
8 2006-08-01 2132.501 
9 2006-09-01 1959.916 
10 2006-10-01 1751.667 
..  ...  ... 
> 
> qplot(date, total, data = d1) + geom_smooth() 
> [scatterplot as desired!]

下一步將嘗試（感謝，哈德利！）並將其全部構建到一個函數中，並/或者嘗試按照我在此處制定的內容進行內聯子集。

來源

2016-03-18 22:37:44 Trent

從一個載體中整理數據

回答

相關問題