2016-11-22 81 views
0

不像sort pandas dataframe based on list,我有一個索引數據幀是這樣的:如何排序的索引數據幀

$ echo -e 'abc\txyz\t0.9\nefg\txyz\t0.3\nlmn\topq\t0.23\nabc\tjkl\t0.5\n' > test.txt 
$ cat test.txt 
abc xyz 0.9 
efg xyz 0.3 
lmn opq 0.23 
abc jkl 0.5 
$ python 

>>> import pandas as pd 
>>> df = pd.read_csv('test.txt', delimiter='\t', header=None, dtype={0:unicode, 1:unicode, 2:float}) 
>>> df = df.pivot(index=0, columns=1, values=2) 
>>> df = df.fillna(0) 
>>> df 
1 jkl opq xyz 
0     
abc 0.5 0.00 0.9 
efg 0.0 0.00 0.3 
lmn 0.0 0.23 0.0 

我想不出如何在這種情況下使用Categorical

# Desired row order. 
>>> row_order = ['efg', 'abc', 'lmn'] 
# Desired column roder. 
>>> col_order = ['xyz', 'jkl', 'opq'] 
>>> pd.Categorical(df[0], categories=row_order, ordered=True) 
Traceback (most recent call last): 
    File "<stdin>", line 1, in <module> 
    File "/usr/local/lib/python2.7/dist-packages/pandas/core/frame.py", line 2059, in __getitem__ 
    return self._getitem_column(key) 
    File "/usr/local/lib/python2.7/dist-packages/pandas/core/frame.py", line 2066, in _getitem_column 
    return self._get_item_cache(key) 
    File "/usr/local/lib/python2.7/dist-packages/pandas/core/generic.py", line 1386, in _get_item_cache 
    values = self._data.get(item) 
    File "/usr/local/lib/python2.7/dist-packages/pandas/core/internals.py", line 3541, in get 
    loc = self.items.get_loc(item) 
    File "/usr/local/lib/python2.7/dist-packages/pandas/indexes/base.py", line 2136, in get_loc 
    return self._engine.get_loc(self._maybe_cast_indexer(key)) 
    File "pandas/index.pyx", line 139, in pandas.index.IndexEngine.get_loc (pandas/index.c:4443) 
    File "pandas/index.pyx", line 161, in pandas.index.IndexEngine.get_loc (pandas/index.c:4289) 
    File "pandas/src/hashtable_class_helper.pxi", line 732, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13733) 
    File "pandas/src/hashtable_class_helper.pxi", line 740, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13687) 
KeyError: 0 

我需要按以下順序實現索引的數據幀:

1 xyz jkl opq 
0     
efg 0.3 0.00 0.0 
abc 0.9 0.50 0.0 
lmn 0.0 0.00 0.23 

回答

1

df.reindex可以重新排列行和列:

In [261]: df.reindex(index=row_order, columns=col_order) 
Out[261]: 
1 xyz jkl opq 
0     
efg 0.3 0.0 0.00 
abc 0.9 0.5 0.00 
lmn 0.0 0.0 0.23 
+0

arghhh ......這很容易!哈哈哈...感謝@unutbu – alvas

+0

當'row_order * col_order'真的很大(~10億)時,'df.reindex()'完成需要相當長的時間。 – alvas