2015-08-15 32 views
3

interploate方法pandas使用有效數據插值nan值。但是,它將保持舊的有效數據不變,如下面的代碼。如何在大熊貓中使用`Series.interpolate`並修改舊值

有什麼方法可以使用interploate方法改變舊值,使系列變得平滑?

In [1]: %matplotlib inline 
In [2]: from scipy.interpolate import UnivariateSpline as spl 
In [3]: import numpy as np 
In [4]: import pandas as pd 
In [5]: samples = { 0.0: 0.0, 0.4: 0.5, 0.5: 0.9, 0.6: 0.7, 0.8:0.3, 1.0: 1.0 } 
In [6]: x, y = zip(*sorted(samples.items())) 

In [7]: df1 = pd.DataFrame(index=np.linspace(0, 1, 31), columns=['raw', 'itp'], dtype=float) 

In [8]: df1.loc[x] = np.array(y)[:, None] 
In [9]: df1['itp'].interpolate('spline', order=3, inplace=True) 
In [10]: df1.plot(style={'itp': 'b-', 'raw': 'rs'}, figsize=(8, 6)) 

enter image description here

In [11]: df2 = pd.DataFrame(index=np.linspace(0, 1, 31), columns=['raw', 'itp'], dtype=float) 
In [12]: df2.loc[x, 'raw'] = y 
In [13]: f = spl(x, y, k=3) 
In [14]: df2['itp'] = f(df2.index) 
In [15]: df2.plot(style={'itp': 'b-', 'raw': 'rs'}, figsize=(8, 6)) 

enter image description here

回答

4

當您使用Series.interpolatemethod='spline',引擎蓋Pandas uses interpolate.UnivariateSpline下。

通過 UnivariateSpline 返回的花鍵不能保證穿過給定爲輸入unless s=0數據點。 但是,默認s=None,它使用不同的平滑因子,從而導致不同的結果。

Series.interpolate方法總是fills in NaN values 而不改變非NaN值。沒有辦法使 Series.interpolate修改非NaN值。所以,當s != 0,結果 產生鋸齒狀跳躍。

所以,如果你想s=None(默認),樣條插值,但沒有 鋸齒狀的跳躍,因爲你已經發現了,你必須直接調用UnivariateSpline 並覆蓋所有值df['itp']

df['itp'] = interpolate.UnivariateSpline(x, y, k=3)(df.index) 

如果你想通過所有非NaN的數據點通過三次樣條,然後 使用s=0

df['itp'].interpolate('spline', order=3, s=0, inplace=True) 

import numpy as np 
import pandas as pd 
import matplotlib.pyplot as plt 
import scipy.interpolate as interpolate 

samples = { 0.0: 0.0, 0.4: 0.5, 0.5: 0.9, 0.6: 0.7, 0.8:0.3, 1.0: 1.0 } 
x, y = zip(*sorted(samples.items())) 

fig, ax = plt.subplots(nrows=3, sharex=True) 
df1 = pd.DataFrame(index=np.linspace(0, 1, 31), columns=['raw', 'itp'], dtype=float) 
df1.loc[x] = np.array(y)[:, None] 

df2 = df1.copy() 
df3 = df1.copy() 

df1['itp'].interpolate('spline', order=3, inplace=True) 
df2['itp'] = interpolate.UnivariateSpline(x, y, k=3)(df2.index) 
df3['itp'].interpolate('spline', order=3, s=0, inplace=True) 
for i, df in enumerate((df1, df2, df3)): 
    df.plot(style={'itp': 'b-', 'raw': 'rs'}, figsize=(8, 6), ax=ax[i]) 
plt.show() 

enter image description here