注意:
不要使用保留字像list
,type
,dict
...作爲掩蔽因爲內置函數變量。
因此,如果使用:
#dict is variable name
dict = {1: ['a', 'b', 'c', 'd'], 2: ['w', 'x', 'y', 'z']}
#create dictionary is not possible, because dict is dictionary
print (dict(a=1, b=2))
{'a': 1, 'b': 2}
得到錯誤:
TypeError: 'dict' object is not callable
和調試是非常複雜的。(測試重新啓動IDE後)
所以請使用其他變量像d
或categories
:
d = {1: ['a', 'b', 'c', 'd'], 2: ['w', 'x', 'y', 'z']}
print (dict(a=1, b=2))
{'a': 1, 'b': 2}
我認爲你需要DataFrame.from_dict
與join
:
d = {1: ['a', 'b', 'c', 'd'], 2: ['w', 'x', 'y', 'z']}
df = pd.DataFrame({"product": [1, 2, 3]})
print (df)
product
0 1
1 2
2 3
df1 = pd.DataFrame.from_dict(d, orient='index')
df1.columns = ['cat' + (str(i+1)) for i in df1.columns]
print(df1)
cat1 cat2 cat3 cat4
1 a b c d
2 w x y z
df2 = df.join(df1, on='product')
print (df2)
product cat1 cat2 cat3 cat4
0 1 a b c d
1 2 w x y z
2 3 NaN NaN NaN NaN
然後可以使用melt
或stack
:
df3 = df2.melt('product', value_name='category').drop('variable', axis=1)
print (df3)
product category
0 1 a
1 2 w
2 3 NaN
3 1 b
4 2 x
5 3 NaN
6 1 c
7 2 y
8 3 NaN
9 1 d
10 2 z
11 3 NaN
df2 = df.set_index('product').join(df1)
.stack(dropna=False)
.reset_index(level=1, drop=True)
.rename('category')
.reset_index()
print (df2)
product category
0 1 a
1 1 b
2 1 c
3 1 d
4 2 w
5 2 x
6 2 y
7 2 z
8 3 NaN
9 3 NaN
10 3 NaN
11 3 NaN
如果列category
是df
解決方案是類似的,只是有必要刪除行與NaN
由DataFrame.dropna
:
d = {1: ['a', 'b', 'c', 'd'], 2: ['w', 'x', 'y', 'z']}
df = pd.DataFrame({"product": [1, 2, 3]})
df['category'] = df['product'].map(d)
print(df)
df1 = df.dropna(subset=['category'])
df1 = pd.DataFrame(df1['category'].values.tolist(), index=df1['product'])
df1.columns = ['cat' + (str(i+1)) for i in df1.columns]
print(df1)
cat1 cat2 cat3 cat4
product
1 a b c d
2 w x y z
df2 = df[['product']].join(df1, on='product')
print (df2)
product cat1 cat2 cat3 cat4
0 1 a b c d
1 2 w x y z
2 3 NaN NaN NaN NaN
這可能會有所幫助:https://開頭stackoverflow.com/questions/32468402/how-to-explode-a-list-inside-a-dataframe-cell-into-separate-rows/32470490#32470490 – Alexander