如何替換熊貓數據框中的字符串中的空格？

假設我有一個熊貓數據幀是這樣的：如何替換熊貓數據框中的字符串中的空格？

Person_1  Person_2  Person_3 
0 John Smith Jane Smith Mark Smith 
1 Harry Jones Mary Jones Susan Jones

重現的形式：

df = pd.DataFrame([['John Smith', 'Jane Smith', 'Mark Smith'], 
       ['Harry Jones', 'Mary Jones', 'Susan Jones'], 
       columns=['Person_1', 'Person_2', 'Person_3'])

什麼是用下劃線來代替姓和名之間的空格在每個名字的最好方法_獲得：

Person_1  Person_2  Person_3 
0 John_Smith Jane_Smith Mark_Smith 
1 Harry_Jones Mary_Jones Susan_Jones

預先感謝您！

來源

2017-02-25 Katie Truong

pandas

stack/unstack與str.replace

df.stack().str.replace(' ', '_').unstack() 

     Person_1 Person_2  Person_3 
0 John_Smith Jane_Smith Mark_Smith 
1 Harry_Jones Mary_Jones Susan_Jones

numpy

pd.DataFrame(
    np.core.defchararray.replace(df.values.astype(str), ' ', '_'), 
    df.index, df.columns) 

     Person_1 Person_2  Person_3 
0 John_Smith Jane_Smith Mark_Smith 
1 Harry_Jones Mary_Jones Susan_Jones

時間測試

來源

2017-02-25 22:54:16 piRSquared

我想你也可以只選擇DataFrame.replace。

df.replace(' ', '_', regex=True)

輸出

 Person_1 Person_2  Person_3 
0 John_Smith Jane_Smith Mark_Smith 
1 Harry_Jones Mary_Jones Susan_Jones

從一些粗略的基準測試，它可預見好像piRSquared的NumPy的解決方案確實是最快的，對於這個小樣本至少，其次是DataFrame.replace。

%timeit df.values[:] = np.core.defchararray.replace(df.values.astype(str), ' ', '_') 
10000 loops, best of 3: 78.4 µs per loop 

%timeit df.replace(' ', '_', regex=True) 
1000 loops, best of 3: 932 µs per loop 

%timeit df.stack().str.replace(' ', '_').unstack() 
100 loops, best of 3: 2.29 ms per loop

有趣然而，似乎piRSquared的大熊貓解決方案適用多比DataFrame.replace更好地與較大DataFrames，甚至優於NumPy的解決方案。

>>> df = pd.DataFrame([['John Smith', 'Jane Smith', 'Mark Smith']*10000, 
         ['Harry Jones', 'Mary Jones', 'Susan Jones']*10000])

%timeit df.values[:] = np.core.defchararray.replace(df.values.astype(str), ' ', '_') 
10 loops, best of 3: 181 ms per loop 

%timeit df.replace(' ', '_', regex=True) 
1 loop, best of 3: 4.14 s per loop 

%timeit df.stack().str.replace(' ', '_').unstack() 
10 loops, best of 3: 99.2 ms per loop

來源

2017-02-25 22:59:14 miradulo

使用replace數據幀的方法：

df.replace('\s+', '_',regex=True,inplace=True)

來源

2017-02-25 22:59:45 Serenity

如何替換熊貓數據框中的字符串中的空格？

回答

相關問題