2016-11-30 139 views
2

我有一個熊貓數據框,其中第一列是列表值。我想循環每個列表的每個str值,並且下一列的值將包含在一起。如何循環熊貓中特定列的列表值?

例如:

tm = pd.DataFrame({'author':[['author_a1','author_a2','author_a3'],['author_b1','author_b2'],['author_c1','author_c2']],'journal':['journal01','journal02','journal03'],'date':pd.date_range('2015-02-03',periods=3)}) 
tm 

    author        date   journal 
0 [author_a1, author_a2, author_a3] 2015-02-03 journal01 
1 [author_b1, author_b2]    2015-02-04 journal02 
2 [author_c1, author_c2]    2015-02-05 journal03 

我想這樣的:

author  date   journal 
0 author_a1 2015-02-03 journal01 
1 author_a2 2015-02-03 journal01 
2 author_a3 2015-02-03 journal01 
3 author_b1 2015-02-04 journal02 
4 author_b2 2015-02-04 journal02 
5 author_c1 2015-02-05 journal03 
6 author_c2 2015-02-05 journal03 

我已經使用了複雜的方法來解決這個問題。有沒有使用熊貓的簡單高效的方法?

author_use = [] 
date_use = [] 
journal_use = [] 

for i in range(0,len(tm['author'])):  
    for m in range(0,len(tm['author'][i])): 
     author_use.append(tm['author'][i][m]) 
     date_use.append(tm['date'][i]) 
     journal_use.append(tm['journal'][i]) 

df_author = pd.DataFrame({'author':author_use, 
         'date':date_use, 
         'journal':journal_use,       
         }) 

df_author 

回答

2

我想你可以通過嵌套listsstr.len和平板值由chain使用numpy.repeat由legths重複值:

from itertools import chain 

lens = tm.author.str.len() 

df = pd.DataFrame({ 
     "date": np.repeat(tm.date.values, lens), 
     "journal": np.repeat(tm.journal.values,lens), 
     "author": list(chain.from_iterable(tm.author))}) 

print (df) 

     author  date journal 
0 author_a1 2015-02-03 journal01 
1 author_a2 2015-02-03 journal01 
2 author_a3 2015-02-03 journal01 
3 author_b1 2015-02-04 journal02 
4 author_b2 2015-02-04 journal02 
5 author_c1 2015-02-05 journal03 
6 author_c2 2015-02-05 journal03 

另一個numpy解決方案:

df = pd.DataFrame(np.column_stack((tm[['date','journal']].values.\ 
    repeat(list(map(len,tm.author)),axis=0) ,np.hstack(tm.author))), 
    columns=['date','journal','author']) 

print (df) 
        date journal  author 
0 2015-02-03 00:00:00 journal01 auther_a1 
1 2015-02-03 00:00:00 journal01 auther_a2 
2 2015-02-03 00:00:00 journal01 auther_a3 
3 2015-02-04 00:00:00 journal02 auther_b1 
4 2015-02-04 00:00:00 journal02 auther_b2 
5 2015-02-05 00:00:00 journal03 auther_c1 
6 2015-02-05 00:00:00 journal03 auther_c2 
+0

'類型錯誤:不能根據規則'safe''將dtype('int64')的數組數據轉換爲dtype('int32')有什麼問題? @jezrael –

+0

這個問題是與樣品或與真實數據? – jezrael

+0

此問題與示例。 –