Python熊貓：在寬型DataFrame中傳遞某些變量

數據傳遞問題：我如何有選擇地從一個過寬的DataFrame中傳遞某些變量？Python熊貓：在寬型DataFrame中傳遞某些變量

例如，我想轉：

df1 = pd.DataFrame(
    [[1,'a','b',.1,-1,10], 
    [2,'a','b',.2,-3,12], 
    [3,'c','d',.3,-5,14]], 
    columns=['sample','id1','id2','x','y1','y2']) 
print df1 
# sample id1 id2 x y1 y2 
#0  1 a b 0.1 -1 10 
#1  2 a b 0.2 -3 12 
#2  3 c d 0.3 -5 14

成：

# sample id position x y 
#0  1 a   1 0.1 -1 
#1  1 b   2 0.1 10 
#2  2 a   1 0.2 -3 
#3  2 b   2 0.2 12 
#4  3 c   1 0.3 -5 
#5  3 d   2 0.3 14

注意，x被複制，和y與位置對齊。

直線pd.melt()創建混合變量和數據類型，這些混合變量和數據類型不易選擇性地重新轉換爲寬泛形式。

print pd.melt(df1, id_vars='sample') 
# sample variable value 
#0  1  id1  a 
#1  2  id1  a 
#2  3  id1  c 
#3  1  id2  b 
#4  2  id2  b 
#5  3  id2  d 
#6  1  x 0.1 
#7  2  x 0.2 
#8  3  x 0.3 
#9  1  y1 -1 
#10  2  y1 -3 
#11  3  y1 -5 
#12  1  y2 10 
#13  2  y2 12 
#14  3  y2 14

有什麼建議嗎？謝謝！

來源

2017-03-16 Matthew Davis

你可以試試這個：

# set columns that don't change as index 
df1.set_index(['sample', 'x'], inplace=True) 

# create multi-index columns based on the names pattern 
df1.columns = pd.MultiIndex.from_arrays(df1.columns.str.extract(r"(\D+)(\d+)", expand=True).T.values) 

# transform the multi-index data frames to long format with stack 
df1.stack(level=1).rename_axis(('sample', 'x', 'position')).reset_index()

來源

2017-03-16 19:18:07 Psidom

首先，真棒答案。由於df.columns.str.extract（）對我來說是一個新問題：如果列名更復雜，例如'['id1，f22'，'id2，f22'，'var50_a1'，'var50_a2 「]'。你只需要使用一些正則表達式來提取正確的var名稱/位置？ –

我不認爲正則表達式可以很容易地處理混合模式列，它必須有一個清晰的模式來將它分割爲多個索引，例如'a1，a2，b1，b2，c1，c2'或'var1_a1，var1_a2 ，var2_a1，var2_a2'都應該沒問題，但對於後者而言，正則表達式應該是'（[^ _] +）_（[^ _] +）'。所以確保你的列名不會發瘋會有所幫助。 – Psidom

很酷，很容易在提取之前重命名列。 –

Python熊貓：在寬型DataFrame中傳遞某些變量

回答

相關問題