如何使用熊貓

添加一個重複的CSV專欄中，我有一個具有與domains只是一列，類似這樣的CSV：如何使用熊貓

google.com 
yahoo.com 
cnn.com 
toast.net

我想添加一個重複的列，並添加頁眉domain和matches等等我csv看起來像：

domain  matching 
google.com google.com 
yahoo.com yahoo.com 
cnn.com  cnn.com 
toast.net toast.net

我想在我的python腳本使用熊貓如下：

df = read_csv('temp.csv') 
df.columns = ['domain', 'matching'] 
df['matching'] = df['domain'] 
df.to_csv('temp.csv', index=False)

，但我收到以下錯誤：

"ValueError: Length mismatch: Expected axis has 1 elements, new values have 2 elements".

我想我需要首先添加其他列？我可以用熊貓做這個嗎？

來源

2016-08-15 P.J.

您可以添加參數name到read_csv：

import pandas as pd 
import io 

temp=u"""google.com 
yahoo.com 
cnn.com 
toast.net""" 

#after testing replace io.StringIO(temp) to filename 
df = pd.read_csv(io.StringIO(temp), names=['domain']) 
#real data 
#df = pd.read_csv('temp.csv', names=['domain']) 

print (df) 
     domain 
0 google.com 
1 yahoo.com 
2  cnn.com 
3 toast.net 

df['matching'] = df['domain'] 

print (df.to_csv(index=False)) 
#real data 
#df.to_csv('temp.csv', index=False) 
domain,matching 
google.com,google.com 
yahoo.com,yahoo.com 
cnn.com,cnn.com 
toast.net,toast.net

您可以修改您的解決方案，但你失去了第一排，因爲它的讀取列名：

df = pd.read_csv(io.StringIO(temp)) 
print (df) 
#real data 
#df = pd.read_csv('temp.csv') 
    google.com 
0 yahoo.com 
1 cnn.com 
2 toast.net 

df.columns = ['domain'] 
df['matching'] = df['domain'] 

df.to_csv('temp.csv', index=False)

但是你可以添加參數header=None到read_csv和從df.columns = ['domain', 'matching']除去第二值，因爲第一DataFrame只有一列：

import pandas as pd 
import io 

temp=u"""google.com 
yahoo.com 
cnn.com 
toast.net""" 
#after testing replace io.StringIO(temp) to filename 
df = pd.read_csv(io.StringIO(temp), header=None) 
print (df) 
#real data 
#df = pd.read_csv('temp.csv', header=None) 
      0 
0 google.com 
1 yahoo.com 
2  cnn.com 
3 toast.net 

df.columns = ['domain'] 
df['matching'] = df['domain'] 

df.to_csv('temp.csv', index=False)

來源

2016-08-15 13:02:07 jezrael

您只需要將'io.StringIO（temp）'改爲''temp.csv''，那麼它就會很好用。 – jezrael

我遇到的問題是它需要一個unicode輸入，你用temp = u「」指定的......但是我把一個csv作爲輸入，所以當我這樣做時：df = pd.read_csv（ io.StringIO（'temp.csv'），header = None），我得到錯誤，「TypeError：initial_value必須是unicode或None，不是str」 –

我添加了處理真實數據的代碼。 – jezrael

如何使用熊貓

回答

相關問題