您可以使用DataFrame
構造與values
爲numpy array
從list
。然後更換None
到NaN
和重命名列,最後add_prefix
df_new = pd.DataFrame(df.Col1.values.tolist())
.fillna(np.nan)
.rename(columns = lambda x: x + 1)
.add_prefix('Col')
print (df_new)
Col1 Col2 Col3 Col4
0 SF NYG 123 NaN
1 SF NYG test test
2 SF NYG foo NaN
3 SF NYG NaN NaN
4 SF NYG 45 NaN
5 SF NYG NaN NaN
6 SF NYG 32 NaN
時序:
#700
df = pd.concat([df]*100).reset_index(drop=True)
#Jez
In [10]: %timeit (pd.DataFrame(df.Col1.values.tolist()))
1000 loops, best of 3: 694 µs per loop
#cᴏʟᴅsᴘᴇᴇᴅ
In [11]: %timeit (pd.DataFrame(df.Col1.tolist()))
1000 loops, best of 3: 705 µs per loop
#Wen
In [12]: %timeit (df.Col1.apply(lambda x: ','.join(str(y) for y in x)).str.split(',', expand=True))
100 loops, best of 3: 3.51 ms per loop
#slowier
In [13]: %timeit (df.Col1.apply(pd.Series))
10 loops, best of 3: 159 ms per loop
#7k
df = pd.concat([df]*1000).reset_index(drop=True)
#jez
In [30]: %timeit (pd.DataFrame(df.Col1.values.tolist()))
1000 loops, best of 3: 1.26 ms per loop
#cᴏʟᴅsᴘᴇᴇᴅ
In [31]: %timeit (pd.DataFrame(df.Col1.tolist()))
1000 loops, best of 3: 1.37 ms per loop
#Wen
In [32]: %timeit (df.Col1.apply(lambda x: ','.join(str(y) for y in x)).str.split(',', expand=True))
10 loops, best of 3: 29 ms per loop
#very slow, the best use only in small dataframes
In [33]: %timeit (df.Col1.apply(pd.Series))
1 loop, best of 3: 1.58 s per loop
#700k
df = pd.concat([df]*100000).reset_index(drop=True)
#jez
In [40]: %timeit (pd.DataFrame(df.Col1.values.tolist()))
10 loops, best of 3: 80.3 ms per loop
#cᴏʟᴅsᴘᴇᴇᴅ
In [41]: %timeit (pd.DataFrame(df.Col1.tolist()))
10 loops, best of 3: 90.5 ms per loop
#Wen
In [42]: %timeit (df.Col1.apply(lambda x: ','.join(str(y) for y in x)).str.split(',', expand=True))
1 loop, best of 3: 2.91 s per loop
#extremely slow
In [3]: %timeit (df.Col1.apply(pd.Series))
1 loop, best of 3: 3min 58s per loop
不錯。我不知道'fillna'也適用於'None'。 –
你能想到的其他方法嗎? –
這是最好的,因爲速度最快。你也可以使用'.apply(pd.Series)',但速度很慢。 – jezrael