my_dataframe.new_column = value？

我只是碰到一個奇怪的熊貓行爲。說我這樣做：my_dataframe.new_column = value？

import string 
import random 
m_size = (4,3) 
num_mat = np.random.random_integers(0,10, m_size) 
my_cols = [random.choice(string.ascii_uppercase) for x in range(matrix.shape[1])] 
mydf = pd.DataFrame(num_mat, columns=['A', 'B', 'C']) 

print mydf 

    A B C 
0 6 6 7 
1 9 10 4 
2 0 10 7 
3 1 3 10

如果我現在做的事：

mydf.D = 4

我希望它來創建填充值4列D，但項mydf沒有改變：

爲什麼？我做了不是得到任何警告或錯誤，那麼mydf.D = 4做什麼？

這是所有與大熊貓的最新穩定版本（0.11.0）

來源

2013-05-06 Amelio Vazquez-Reina

儘管大熊貓讓你閱讀列與df.Col，這顯然只是df['Col']的簡寫，而沒有按速記」無法創建新列。你需要做mydf['D'] = 4。

我覺得這很不幸，因爲我經常嘗試像你一樣做。隱晦的部分是，它實際上在數據幀對象上創建了一個名爲D的普通Python屬性;它只是不實際添加爲列。所以，你必須確保刪除屬性，或將隱藏的列，即使你以後正確添加：

>>> d = pandas.DataFrame(np.random.randn(3, 2), columns=["A", "B"]) 
>>> d 
      A   B 
0 -0.931675 1.029137 
1 -0.363033 -0.227672 
2 0.058903 -0.362436 
>>> d.Col = 8 
>>> d.Col # Attribute is there 
8 
>>> d['Col'] # But it is not a columns, just a simple attribute 
Traceback (most recent call last): 
    File "<pyshell#8>", line 1, in <module> 
    d['Col'] 
    File "c:\users\brenbarn\documents\python\extensions\pandas\pandas\core\frame.py", line 1906, in __getitem__ 
    return self._get_item_cache(key) 
    File "c:\users\brenbarn\documents\python\extensions\pandas\pandas\core\generic.py", line 570, in _get_item_cache 
    values = self._data.get(item) 
    File "c:\users\brenbarn\documents\python\extensions\pandas\pandas\core\internals.py", line 1383, in get 
    _, block = self._find_block(item) 
    File "c:\users\brenbarn\documents\python\extensions\pandas\pandas\core\internals.py", line 1525, in _find_block 
    self._check_have(item) 
    File "c:\users\brenbarn\documents\python\extensions\pandas\pandas\core\internals.py", line 1532, in _check_have 
    raise KeyError('no item named %s' % com.pprint_thing(item)) 
KeyError: u'no item named Col' 
>>> d['Col'] = 100 # Create a real column 
>>> d.Col # Attribute blocks access to column 
8 
>>> d['Col'] # Column is available via item access 
0 100 
1 100 
2 100 
Name: Col, dtype: int64 
>>> del d.Col # Delete the attribute 
>>> d.Col  # Columns is now available as an attribute (!) 
0 100 
1 100 
2 100 
Name: Col, dtype: int64 
>>> d['Col'] # And still as an item 
5: 0 100 
1 100 
2 100 
Name: Col, dtype: int64

它可以是有點令人吃驚地看到，d.Col「刪除之後只能」 - - 也就是說，在你做del d.Col之後，隨後閱讀d.Col實際上會給你這個專欄。這僅僅是因爲Python __getattr__的工作原理，但在這種情況下它仍然有點不直觀。

來源

2013-05-06 20:07:12 BrenBarn

我明白了。我推測/希望計劃最終實現這一功能。我想我應該在Github上報告。 – 2013-05-06 20:08:58

@ user815423426：你可以試試看。不過，我認爲有理由保持謹慎。問題在於虛線名稱屬性訪問與方法佔用相同的名稱空間，因此使用此方法不能添加與DataFrame方法具有相同名稱的列（例如，「sum」），並且實際上可能會覆蓋方法與你的數據，這肯定會很糟糕。 – BrenBarn 2013-05-06 20:16:18

另外，當可以通過df.Col創建新的列時，它將使得無法修復DataFrame類。許多人將自己的方法添加到DataFrame類中。考慮將df.Col作爲交互式使用的便捷方法，生產代碼使用數據訪問方法（請參閱http://pandas.pydata.org/pandas-docs/stable/indexing.html頂部的註釋）以及有趣的https ：//github.com/pydata/pandas/issues/3056 – 2013-05-07 06:47:46

my_dataframe.new_column = value？

回答

相關問題