2015-10-13 45 views
2

我有一個dataset ,溫度爲一列。由於加熱器的工作原理,數據中存在一些缺陷。爲了直接比較不同的數據集,我想填寫這些缺失的溫度,並在另一列中添加相應的NaN。通過重新索引將行插入數據框

我試過使用這裏給出的答案,這似乎正是我想要的:link。 但是,這並不工作 - 我得到了新的溫度值,我希望有一個數據幀,但相應的數據已經:

import pandas as pd 
import numpy as np   
A1 = pd.read_table('Test data.tsv', encoding='ISO-8859-1', header = 2) 
A1.columns = ['time',2,3,4,5,6,7,'freq',9,10,11,12,13,'temp',15,16,17,18,19] 
A1truncated = A1[A1.temp >= 25]; A1truncated=A1truncated[A1truncated.temp <= 350.1] 
A1averaged = A1truncated.groupby(['temp'], as_index=False)['freq'].mean() 
A1averaged = np.around(A1averaged, decimals=1) 

A1averaged.set_index('temp') 
new_index = pd.Index(np.arange(25, 350, 0.1), name='temp') 
A1indexed = A1averaged.set_index('temp').reindex(new_index).reset_index() 

打開我的19列到1溫度指數( A1averaged),然後分成2列,新的溫度列表和一列空數據(A1索引)。 任何想法爲什麼這不起作用?或另一種方法來做同樣的事情?

回答

1

帶浮點的索引可以帶有問題reindex,不一致可能是由於浮點精度。所以我使用一點破解 - Int64Index而不是Float64Index

我嘗試設置的子集更簡單的方法:

A1truncated = A1[(A1.temp >= 25) & (A1.temp <= 350.1)] 

則省略第一套指標,因爲設置了兩次:

A1averaged.set_index('temp') 

設置new_indexInt64Index

new_index = pd.Index(np.arange(250, 3500), name='temp') 

和使用Int64Index將列temp乘以10,最後一列除以10

A1averaged['temp'] = A1averaged['temp'] * 10 
A1indexed['temp'] = A1indexed['temp']/10 

一起:

import pandas as pd 
import numpy as np   
A1 = pd.read_table('Test data.tsv', encoding='ISO-8859-1', header = 2) 

A1.columns = ['time',2,3,4,5,6,7,'freq',9,10,11,12,13,'temp',15,16,17,18,19] 

A1truncated = A1[(A1.temp >= 25) & (A1.temp <= 350.1)] 

A1averaged = A1truncated.groupby(['temp'], as_index=False)['freq'].mean() 
A1averaged = np.around(A1averaged, decimals=1) 
new_index = pd.Index(np.arange(250, 3500), name='temp') 

A1averaged['temp'] = A1averaged['temp'] * 10 
A1indexed = A1averaged.set_index('temp').reindex(new_index).reset_index() 
A1indexed['temp'] = A1indexed['temp']/10 
print A1indexed.tail() 
#  temp  freq 
#3245 349.5 5830065.6 
#3246 349.6 5830043.5 
#3247 349.7 5830046.3 
#3248 349.8 5830025.3 
#3249 349.9 5830015.6 
+0

完美,非常感謝!我永遠不會注意到浮動問題 – Yobmod

相關問題