2013-02-27 58 views
0

我正在嘗試做批量正態分佈測試。批量正態分佈測試

我的數據是這樣的:

"Date","Department","Discipline","Employee ID","SumOfBillable Hrs" 
"10/09/2012","D","B",50084.00,8.00 
"10/09/2012","D","C",51870.00,10.00 
"10/09/2012","D","E",50216.00,10.00 
"10/09/2012","D","E",53422.00,9.00 
"10/09/2012","D","E",53765.00,10.00 
"14/01/2013","E","Y",53146.00,9.00 
"14/01/2013","E","Y",53202.00,9.00 
"14/01/2013","E","Y",54470.00,9.00 
"14/01/2013","SITE","0",54525.00,9.00 
"14/02/2013","D","C",51870.00,10.00 
"14/02/2013","D","E",50029.00,8.50 
"14/02/2013","D","E",50216.00,9.00 
"14/02/2013","D","E",53422.00,4.00 

我想查下各個Employee_ID的時間分佈。

有沒有批量的方法來做到這一點? 我有80多個IDs。因此,單獨採取每個ID並繪製/創建描述性統計數據將是相當乏味的。

謝謝

+1

添加您的數據樣本,以幫助我們理解和回答您的問題 – Pop 2013-02-27 08:24:54

+2

你可以很容易地通過你的「的Employee_ID」變量拆分「小時」變量和計算描述性統計和使用'對得到的lapply'產生地塊名單。顯示一些示例數據,您可能會得到更具體的答案。 – A5C1D2H2I1M1N2O1R2T1 2013-02-27 08:30:30

+0

這些是相關的:http://stackoverflow.com/questions/7781798/seeing-if-data-is-normally-distributed-in-r,http://stats.stackexchange.com/questions/2492/is-normality -testing-essentially-useless – Ben 2013-02-27 09:36:40

回答

1

你可以從這樣的事情開始。如果你想要一些不同的東西,你必須提供更多關於你想要用它做什麼的信息。

data <- read.table(header=T, sep=",", 
text='"Date","Department","Discipline","Employee ID","SumOfBillable Hrs" 
"10/09/2012","D","B",50084.00,8.00 
"10/09/2012","D","C",51870.00,10.00 
"10/09/2012","D","E",50216.00,10.00 
"10/09/2012","D","E",53422.00,9.00 
"10/09/2012","D","E",53765.00,10.00 
"14/01/2013","E","Y",53146.00,9.00 
"14/01/2013","E","Y",53202.00,9.00 
"14/01/2013","E","Y",54470.00,9.00 
"14/01/2013","SITE","0",54525.00,9.00 
"14/02/2013","D","C",51870.00,10.00 
"14/02/2013","D","E",50029.00,8.50 
"14/02/2013","D","E",50216.00,9.00 
"14/02/2013","D","E",53422.00,4.00') 



# Means: 
aggregate(SumOfBillable.Hrs ~ Employee.ID, data=data, FUN=mean) 

# Standard Deviations: 
aggregate(SumOfBillable.Hrs ~ Employee.ID, data=data, FUN=sd) 

# Or a Shapiro normality test: (only works if you have more than 3 observations per Employee.ID 
aggregate(SumOfBillable.Hrs ~ Employee.ID, data=data, FUN=shapiro.test) 
+0

數據存儲在MS訪問數據庫中,我已將其轉儲到csv文件中。我想得到正常的分佈圖,標準差,平均值,並對每個唯一的員工ID進行正常性測試。 – KillerSnail 2013-02-27 08:52:51