初始化大熊貓dataframes使用和不使用索引，列產生不同的結果

如果我用下面的方法來構建一個pandas.DataFrame，我得到一個輸出（我認爲）是奇特：初始化大熊貓dataframes使用和不使用索引，列產生不同的結果

import pandas, numpy 

df = pandas.DataFrame(
    numpy.random.rand(100,2), index = numpy.arange(100), columns = ['s1','s2']) 
smoothed = pandas.DataFrame(
    pandas.ewma(df, span = 21), index = df.index, columns = ['smooth1','smooth2'])

當我去看看在平滑值，我得到：

>>> smoothed.tail() 
smooth1 smooth2 
95  NaN  NaN 
96  NaN  NaN 
97  NaN  NaN 
98  NaN  NaN 
99  NaN  NaN

這似乎是它下面的零散調用，產生不同的結果的彙總：

smoothed2 = pandas.DataFrame(pandas.ewma(df, span = 21)) 
smoothed2.index = df.index 
smoothed2.columns = ['smooth1','smooth2']

再次使用DataFrame.tail()調用我得到：

>>> smoothed2.tail() 
smooth1 smooth2 
95 0.496021 0.501153 
96 0.506118 0.507541 
97 0.516655 0.544621 
98 0.520212 0.543751 
99 0.518170 0.572429

任何人都可以提供理由，爲什麼這些數據幀到施工方法應有所不同？

來源

2012-02-23 benjaminmgross

ewma(df, span=21)的結果已經是一個DataFrame，所以當您將它傳遞給DataFrame構造函數以及列列表時，它將「選擇」您傳遞的列。在這種特殊情況下很難打破標籤和數據之間的聯繫。如果你這樣做：

In [23]: smoothed = DataFrame(ewma(df, span = 21).values, index=df.index, columns = ['smooth1','smooth2']) 
In [24]: smoothed.head() 
Out[24]: 
    smooth1 smooth2 
0 0.218350 0.877693 
1 0.400214 0.813499 
2 0.308564 0.739426 
3 0.433341 0.641891 
4 0.525260 0.620541

這是沒有問題的。當然

smoothed = ewma(df, span=21) 
smoothed.columns = ['smooth1', 'smooth2']

是完全沒有過

來源

2012-02-23 21:25:24

韋斯，你真了不起。感謝您構建這樣一個驚人的抽象，並感謝這樣一個快速的響應！ – benjaminmgross 2012-02-23 21:32:49

初始化大熊貓dataframes使用和不使用索引，列產生不同的結果

回答

相關問題