從txt文件中計算平均值，標準差的高效方法

以下是許多txt文件之一的樣子的副本。從txt文件中計算平均值，標準差的高效方法

Class 1: 
Subject A: 
posX posY posZ x(%) y(%) 
    0 2 0 81 72 
    0 2 180 63 38 
-1 -2 0 79 84 
-1 -2 180 85 95 
    . . . .  . 
Subject B: 
posX posY posZ x(%) y(%) 
    0 2  0 71  73 
-1 -2  0 69  88 
    . .  . .  . 
Subject C: 
posX posY posZ x(%) y(%) 
    0 2 0 86  71 
-1 -2 0 81  55 
    . . .  .  . 
Class 2: 
Subject A: 
posX posY posZ x(%) y(%) 
    0 2 0 81 72 
-1 -2 0 79 84 
    . . . .  .

的班級，學科行條目的所有變化數。
的Class1-受試者A總是具有有0交替與180
計算平均X（％），Y（％）的按類別和由主體
計算x的標準偏差（％），Y posZ條目（％）按類別和主題
也忽略180行的posZ計算平均值和std_deviations當

我已經開發了在Excel中笨拙的解決方案（採用宏觀的和VBA），但我寧願去了Python中更優化的解決方案。

numpy非常有幫助，但.mean（），.std（）函數只能用於數組 - 我還在研究一些其他的功能以及熊貓的groupby函數。

我想最後的輸出如下所示（1.級，2分科）

1. By Class     
      X  Y      
Average       
std_dev  

2. By Subject 
      X  Y 
Average 
std_dev

來源

2012-07-05 user1504774

如果您已經在使用'numpy';請看['pandas']（http://pandas.pydata.org/）分組功能。 – jfs 2012-07-05 19:08:38

您的問題是將數據文件讀入您可以使用的東西嗎？或者用已經讀入的結構獲得輸出結果？ – Amyunimus 2012-07-05 23:55:26

我想用字典的工作（和字典的列表）是獲得一個好辦法熟悉使用python中的數據。要像這樣格式化數據，您需要讀取文本文件並逐行定義變量。

要啓動：

for line in infile: 
    if line.startswith("Class"): 
     temp,class_var = line.split(' ') 
     class_var = class_var.replace(':','') 
    elif line.startswith("Subject"): 
     temp,subject = line.split(' ') 
     subject = subject.replace(':','')

這將創建一個對應於當前等級和當前主題的變量。然後，你想讀你的數字變量。只讀這些值的一種好方法是通過try聲明，該聲明將嘗試使它們成爲整數。

else: 
     line = line.split(" ") 
     try: 
      keys = ['posX','posY','posZ','x_perc','y_perc'] 
      values = [int(item) for item in line] 
      entry = dict(zip(keys,values)) 
      entry['class'] = class_var 
      entry['subject'] = subject 
      outputList.append(entry) 
     except ValueError: 
      pass

這會將它們放入字典形式，包括早先定義的類和主題變量，並將它們附加到outputList。你會這樣結束了：

[{'posX': 0, 'x_perc': 81, 'posZ': 0, 'y_perc': 72, 'posY': 2, 'class': '1', 'subject': 'A'}, 
{'posX': 0, 'x_perc': 63, 'posZ': 180, 'y_perc': 38, 'posY': 2, 'class': '1', 'subject': 'A'}, ...]

等

然後，您可以平均/由子集化字典的列表（應用規則像不含posZ = 180等）採取SD。以下是按照等級劃分的平均值：

classes = ['1','2'] 
print "By Class:" 
print "Class","Avg X","Avg Y","X SD","Y SD" 
for class_var in classes: 

    x_m = np.mean([item['x_perc'] for item in output if item['class'] == class_var and item['posZ'] != 180]) 
    y_m = np.mean([item['y_perc'] for item in output if item['class'] == class_var and item['posZ'] != 180]) 
    x_sd = np.std([item['x_perc'] for item in output if item['class'] == class_var and item['posZ'] != 180]) 
    y_sd = np.std([item['y_perc'] for item in output if item['class'] == class_var and item['posZ'] != 180]) 

    print class_var,x_m,y_m,x_sd,y_sd

你必須打印輸出才能得到你想要的東西，但這應該讓你開始。

來源

2012-07-06 00:21:14 Amyunimus

從txt文件中計算平均值，標準差的高效方法

回答

相關問題