基於字段的子集數據幀

mukey cokey  hzdept_r hzdepb_r 
422927 11090397 0  20 
422927 11090397 20  71 
422927 11090397 71  152 
422927 11090398 0  18 
422927 11090398 18  117 
422927 11090398 117  152

我想子集上面的數據框，以便只選擇第一組的cokey（在本例中爲11090397）。當然，由於這是一個樣本數據集，因此解決方案需要擴展到這種數據框的更大版本。基於字段的子集數據幀

在這種情況下，所產生的數據集應該是：

mukey cokey  hzdept_r hzdepb_r 
422927 11090397 0  20 
422927 11090397 20  71 
422927 11090397 71  152

我一直在使用GROUPBY嘗試過，但不知道如何從中選擇只有第一個cokey值。

來源

2015-03-31 user308827

另一種方法是隻取前獨特的價值：

In [97]: 

df[df['cokey'] == df['cokey'].unique()[0]] 
Out[97]: 
    mukey  cokey hzdept_r hzdepb_r 
0 422927 11090397   0  20 
1 422927 11090397  20  71 
2 422927 11090397  71  152

您還可以使用基於整數索引，以獲得用於過濾的第一個值：

In [99]: 

df[df['cokey'] == df['cokey'].iloc[0]] 
Out[99]: 
    mukey  cokey hzdept_r hzdepb_r 
0 422927 11090397   0  20 
1 422927 11090397  20  71 
2 422927 11090397  71  152

來源

2015-03-31 21:04:21 EdChum

第一個唯一值在第一個值上有優勢嗎？ – cphlewis 2015-03-31 22:35:28

不是真的，它只是從df獲取值的另一種方法。 – EdChum 2015-03-31 22:37:01

獨特的文檔沒有指定它維護順序，tho。 – cphlewis 2015-04-01 00:06:16

如果你正在尋找的第一個所有cokey的的是等於DF，使用第一cokey的DF：

test[test['cokey'] == test.cokey[0]]

編輯： @dsm是正確的，上面的代碼你會給你索引零的cokey，所以如果你的df沒有從零開始的自動增量索引，你可能不會得到實際的期望結果。而是使用：

test[test['cokey'] == test.iloc[0]['cokey']]

來源

2015-03-31 20:53:48

謝謝利亞姆，但硬編碼11090397將無法工作，因爲較大的數據幀可以具有其他cokey值。 – user308827 2015-03-31 20:54:35

@ user308827啊，所以你只想要第一套，不管它是什麼？ – 2015-03-31 20:55:29

準確！我會更新這個問題來反映這一點。 – user308827 2015-03-31 20:55:50

如果DF是樣本數據框：

cokeys = set(df.cokey) #unique keys 
for k in cokeys: 
    print df[df.cokey==k] #sub-dataframes

結果：

mukey  cokey hzdept_r hzdepb_r 
0 422927 11090397   0  20 
1 422927 11090397  20  71 
2 422927 11090397  71  152 
    mukey  cokey hzdept_r hzdepb_r 
3 422927 11090398   0  18 
4 422927 11090398  18  117 
5 422927 11090398  117  152

如果你從字面上只想要第一個數據幀，讓k=df.iloc[0].cokey。

來源

2015-03-31 20:58:17 cphlewis

基於字段的子集數據幀

回答

相關問題