2017-10-13 60 views
0

數據框看起來像這樣(空白單元格會以「」,場,extra_dimensions是列)熊貓:條件根據列表中的小區

field | extra_dimensions 
------------------------ 
a  | 
b  | [abc, def] 
c  | [ghi] 

我有所需的尺寸和額外維度的列表:

required_dimensions = [123, 456] 
extra_dimensions = [abc, def, ghi] 

希望的輸出:

field | 123 | 456 | abc | def | ghi 
----------------------------------- 
a  | 1 | 1 | 0 | 0 | 0 
b  | 1 | 1 | 1 | 1 | 0 
c  | 1 | 1 | 0 | 0 | 1 

嘗試:

columns = ['field', 'extra_dimensions'] + required_dimensions + extra_dimensions 
df = df.reindex(columns=columns) 
for i in required_dimensions: 
    df[i].fillna('1', inplace=True) 
for i in extra_dimensions: 
    df[i][df['extra_dimensions'].str.contains(i)] = '1' 

,但我得到:

ValueError: cannot index with vector containing NA/NaN values 

會愛我嘗試或一個更好的辦法的任何想法任何輸入。提前致謝!

回答

0

再次使用get_dummies .....

required_dimensions = ['123', '456'] 
df=pd.DataFrame({'field':list('abc'),'extra_dimensions':[[],['abc','def'],['ghi']]}) 
df=pd.get_dummies(df.set_index('field')['extra_dimensions'].apply(pd.Series).stack()).sum(level=0).reindex(df.field).fillna(0) 
d = dict.fromkeys(required_dimensions, 1) 
df.assign(**d) 

Out[283]: 
     abc def ghi 123 456 
field       
a  0.0 0.0 0.0 1 1 
b  1.0 1.0 0.0 1 1 
c  0.0 0.0 1.0 1 1 
+0

非常感謝 - 這爲我工作。 – user8766186

+0

@ user8766186 Yw〜 – Wen