2015-03-31 57 views
3

編輯(19-May-2015):我剛剛證實,這已被固定爲版本0.16.1,所以這應該不是在最新版本中的問題。奇怪的結果與groupby,轉換和NaN

這些都應該給出相同的結果,對吧?

df.groupby(level=0).transform('mean') 
df.groupby(level=0)['x'].transform(np.nanmean) 
df.groupby(level=0)['x'].transform('mean') 

前兩個都行,但是第三個行不通。可能是一個錯誤?

df = pd.DataFrame({ 'x':[1,np.nan,3,4] }, index=[1,1,2,2],) 

df 
Out[686]: 
    x 
1 1 
1 NaN 
2 3 
2 4 

df.groupby(level=0).transform('mean') 
Out[687]: 
    x 
1 1.0 
1 1.0 
2 3.5 
2 3.5 

df.groupby(level=0)['x'].transform(np.nanmean) 
Out[688]: 
1 1.0 
1 1.0 
2 3.5 
2 3.5 
Name: x, dtype: float64 

這是所有好的,但不是這樣的:

df.groupby(level=0)['x'].transform('mean') 
--------------------------------------------------------------------------- 
ValueError        Traceback (most recent call last) 
<ipython-input-691-24761ee742fd> in <module>() 
----> 1 df.groupby(level=0)['x'].transform('mean') 

C:\Users\eilerj\AppData\Local\Continuum\Anaconda\lib\site-packages\pandas\core\groupby.pyc in transform(self, func, *args, **kwargs) 
    2411   # if string function 
    2412   if isinstance(func, compat.string_types): 
-> 2413    return self._transform_fast(lambda : getattr(self, func)(*args, **kwargs)) 
    2414 
    2415   # do we have a cython function 

C:\Users\eilerj\AppData\Local\Continuum\Anaconda\lib\site-packages\pandas\core\groupby.pyc in _transform_fast(self, func) 
    2457   values = np.repeat(values, com._ensure_platform_int(counts)) 
    2458 
-> 2459   return self._set_result_index_ordered(Series(values)) 
    2460 
    2461  def filter(self, func, dropna=True, *args, **kwargs): 

C:\Users\eilerj\AppData\Local\Continuum\Anaconda\lib\site-packages\pandas\core\groupby.pyc in _set_result_index_ordered(self, result) 
    495    result = result.sort_index() 
    496 
--> 497   result.index = self.obj.index 
    498   return result 
    499 

C:\Users\eilerj\AppData\Local\Continuum\Anaconda\lib\site-packages\pandas\core\generic.pyc in __setattr__(self, name, value) 
    1978   try: 
    1979    object.__getattribute__(self, name) 
-> 1980    return object.__setattr__(self, name, value) 
    1981   except AttributeError: 
    1982    pass 

C:\Users\eilerj\AppData\Local\Continuum\Anaconda\lib\site-packages\pandas\lib.pyd in pandas.lib.AxisProperty.__set__ (pandas\lib.c:38795)() 

C:\Users\eilerj\AppData\Local\Continuum\Anaconda\lib\site-packages\pandas\core\series.pyc in _set_axis(self, axis, labels, fastpath) 
    266   object.__setattr__(self, '_index', labels) 
    267   if not fastpath: 
--> 268    self._data.set_axis(axis, labels) 
    269 
    270  def _set_subtyp(self, is_all_dates): 

C:\Users\eilerj\AppData\Local\Continuum\Anaconda\lib\site-packages\pandas\core\internals.pyc in set_axis(self, axis, new_labels) 
    2209   if new_len != old_len: 
    2210    raise ValueError('Length mismatch: Expected axis has %d elements, ' 
-> 2211        'new values have %d elements' % (old_len, new_len)) 
    2212 
    2213   self.axes[axis] = new_labels 

ValueError: Length mismatch: Expected axis has 3 elements, new values have 4 elements 
+3

我認爲這是一個錯誤,我應該修正這裏(https://github.com/pydata/pandas/pull/9699)。你可以檢查一下幹線熊貓來確認嗎? – DSM 2015-03-31 20:52:53

+0

@DSM對不起,我不知道如何檢查樹幹熊貓。這是最新的熊貓(16.0),但它看起來像你可能只在幾天前修復它。我現在就把它留下,但如果我應該刪除這個問題,請讓我知道。 – JohnE 2015-03-31 21:12:18

+1

我投票結束這個問題作爲題外話,因爲這是一個錯誤報告,並會更好地作爲Github問題(雖然這恰好已經修復!)。 :) – 2015-03-31 22:20:52

回答

0

我已驗證這確實被固定在0.16.1版本。查看@ DSM和@AndyHayden的評論。