我正在將多個數據文件讀入數據框並計算平均值。在我連接數據框後,我再次計算平均值,但熊貓給我的錯誤答案。熊貓給出了錯誤的意思
temp = pd.read_csv(appDelayFile, delimiter='\t')
temp = temp.groupby(['Type', 'Node']).mean()
temp = temp.ix['FullDelay']
d = pd.concat([d, temp])
print d # separate parsed data frames
d = d.groupby(d.index).mean()
print d # after calculating the mean
在第一次印刷我得到('0.574193', '0.441335', and '2.71299')
,其平均值爲'1.2428393333'
。但第二次印刷給我'1.610377'
。
代碼有問題嗎?或者這是一個錯誤?
**編輯**
樣本數據文件1:
Time Node AppId SeqNo Type DelayS RetxCount HopCount
0.054701 25 1 0 LastDelay 0.054701 1 8
0.054701 25 1 0 FullDelay 0.054701 1 8
0.00708243 26 1 0 LastDelay 0.00708243 1 2
0.00708243 26 1 0 FullDelay 0.00708243 1 2
0.036943 25 1 0 LastDelay 0.036943 1 6
0.036943 25 1 0 FullDelay 0.036943 1 6
0.0582151 26 1 0 LastDelay 0.0582151 1 12
0.0582151 26 1 0 FullDelay 0.0582151 1 12
樣本數據文件2:
Time Node AppId SeqNo Type DelayS RetxCount HopCount
0.0252673 25 1 0 LastDelay 0.0252673 1 6
0.0252673 25 1 0 FullDelay 0.0252673 1 6
0.00655327 26 1 0 LastDelay 0.00655327 1 2
0.00655327 26 1 0 FullDelay 0.00655327 1 2
0.023523 25 1 0 LastDelay 0.023523 1 8
0.023523 25 1 0 FullDelay 0.023523 1 8
0.0380394 26 1 0 LastDelay 0.0380394 1 4
0.0380394 26 1 0 FullDelay 0.0380394 1 4
樣本數據文件3:
Time Node AppId SeqNo Type DelayS RetxCount HopCount
0.0276086 25 1 0 LastDelay 0.0276086 1 8
0.0276086 25 1 0 FullDelay 0.0276086 1 8
0.0197642 26 1 0 LastDelay 0.0197642 1 4
0.0197642 26 1 0 FullDelay 0.0197642 1 4
0.00708267 25 1 0 LastDelay 0.00708267 1 2
0.00708267 25 1 0 FullDelay 0.00708267 1 2
0.00708268 26 1 0 LastDelay 0.00708268 1 2
0.00708268 26 1 0 FullDelay 0.00708268 1 2
已析數據文件:
Time AppId SeqNo DelayS DelayUS RetxCount HopCount
25 0.045822 1 0 0.045822 45822.000 1 7
26 0.032649 1 0 0.032649 32648.765 1 7
Time AppId SeqNo DelayS DelayUS RetxCount HopCount
Node
25 0.024395 1 0 0.024395 24395.150 1 7
26 0.022296 1 0 0.022296 22296.335 1 3
Time AppId SeqNo DelayS DelayUS RetxCount HopCount
Node
25 0.017346 1 0 0.017346 17345.635 1 5
26 0.013423 1 0 0.013423 13423.440 1 3
第二打印示出了數據幀的平均(這是錯誤的):
Time AppId SeqNo DelayS DelayUS RetxCount HopCount
25 0.026227 1 0 0.026227 26227.105 1 6
26 0.020448 1 0 0.020448 20447.995 1 4
這是print temp = temp.groupby(['Type', 'Node']).count()
Time AppId SeqNo DelayS DelayUS RetxCount HopCount
Node
25 2 2 2 2 2 2 2
26 2 2 2 2 2 2 2
你可以發佈你使用的數據嗎?什麼是'd'? – darthbith
對於調試問題,您需要提供[mcve]。請包括一些顯示問題的數據集(儘可能簡短,並在最佳情況下複製和粘貼)。 :) – MSeifert
你可以顯示'print'語句的_actual_輸出嗎?第二個打印語句應該顯示一個DataFrame(或可能是一個Series),而不是一個單獨的值。 –