如果列值來自不同的文件，如何將列插入數據框？

目前，我從文件中讀取，它是生成此文件（output.txt）：如果列值來自不同的文件，如何將列插入數據框？

Atom nVa avgppm stddev delta 
1.H1' 2 5.73649 0.00104651803616 1.0952e-06 
1.H2' 1 4.85438 
1.H8 1 8.05367 
10.H1' 3 5.33823 0.136655138213 0.0186746268 
10.H2' 1 4.20449 
10.H5 3 5.27571333333 0.231624986634 0.0536501344333 
10.H6 5 7.49485 0.0285124165935 0.0008129579

這是讀取生成此文件（我從一個文本文件讀取生成這些值）的代碼

df = pd.read_csv(expAtoms, sep = ' ', header = None) 
df.columns = ["Atom","ppm"] 
gb = (df.groupby("Atom", as_index=False).agg({"ppm":["count","mean","std","var"]}).rename(columns={"count":"nVa", "mean":"avgppm","std":"stddev","var":"delta"})) 

gb.head() 

gb.columns = gb.columns.droplevel() 
gb = gb.rename(columns={"":"Atom"}) 

gb.to_csv("output.txt", sep =" ", index=False)

在我nVa列和我avgppm柱之間，我想插入叫predppm另一列。我想從一個名爲file.txt文件看起來像這樣得到的數值：

5.H6 7.72158 0.3 
6.H6 7.70272 0.3 
7.H8 8.16859 0.3 
1.H1' 7.65014 0.3 
9.H8 8.1053 0.3 
10.H6 7.5231 0.3

我如何檢查是否在file.txt第一列中的值=第一列的output.txt，如果它的價值，將第二列file.txt的值插入到我的輸出文件中nVa列和avgppm列之間的列中？

例如，1.H1'是在output.txt的和file.txt的，所以我想創建一個在我output.txt的文件稱爲predppm柱和具有值7.65014（它來自的file.txt的第二列）插入爲1.H1'原子。

我想我明白如何添加列，但僅限於可以用於groupby的函數，但我不知道如何在輸出中插入任意列。

來源

2017-08-29 user8290579

最簡單的方法是在pandas.DataFrame上製作index。熊貓有很好的匹配索引的邏輯。

from io import StringIO 
import pandas as pd 

# if python2, do: 
# data = u"""\ 
data = """\ 
Atom nVa avgppm stddev delta 
1.H1' 2 5.73649 0.00104651803616 1.0952e-06 
1.H2' 1 4.85438 
1.H8 1 8.05367 
10.H1' 3 5.33823 0.136655138213 0.0186746268 
10.H2' 1 4.20449 
10.H5 3 5.27571333333 0.231624986634 0.0536501344333 
10.H6 5 7.49485 0.0285124165935 0.0008129579 
""" 

# if python2, do: 
# other_data = u"""\ 
other_data = """\ 
5.H6 7.72158 0.3 
6.H6 7.70272 0.3 
7.H8 8.16859 0.3 
1.H1' 7.65014 0.3 
9.H8 8.1053 0.3 
10.H6 7.5231 0.3 
""" 

# setup these strings so they can be read by pd.read_csv 
# (not necessary if these are actual files on disk) 
data_file = StringIO(data) 
other_data_file = StringIO(other_data) 

# don't say header=None because the first row has the column names 
df = pd.read_csv(data_file, sep=' ') 
# set the index to 'Atom' 
df = df.set_index('Atom') 

# header=None because the other_data doesn't have header info 
other_df = pd.read_csv(other_data_file, sep=' ', header=None) 
# set the column names since they're not specified in other_data 
other_df.columns = ['Atom', 'predppm', 'some_other_field'] 
# set the index to 'Atom' 
other_df = other_df.set_index('Atom') 

# this will assign other_df['predppm'] to the correct rows, 
# because pandas uses the index when assigning new columns 
df['predppm'] = other_df['predppm'] 

print(df) 
#   nVa avgppm stddev  delta predppm 
# Atom            
# 1.H1'  2 5.736490 0.001047 0.000001 7.65014 
# 1.H2'  1 4.854380  NaN  NaN  NaN 
# 1.H8  1 8.053670  NaN  NaN  NaN 
# 10.H1' 3 5.338230 0.136655 0.018675  NaN 
# 10.H2' 1 4.204490  NaN  NaN  NaN 
# 10.H5  3 5.275713 0.231625 0.053650  NaN 
# 10.H6  5 7.494850 0.028512 0.000813 7.52310 

# if you want to return 'Atom' to being a column: 
df = df.reset_index()

來源

2017-08-29 04:33:23 Hazzles

現在我得到一個錯誤，說'TypeError：initial_value必須是unicode或None，不是str'。我爲我的文本文件做了一個變量，所以 'output = output.txt'，然後做了'data_file = StringIO（output）'，這是我得到的錯誤 – user8290579

對不起，我的答案在python3中有效。使其在Python 2中工作，看到我上面的新評論 – Hazzles

對不起，我只是有點困惑。你是說'data'和'other_data'是代碼本身定義的字符串？但對於我'data'和'other_data'是我正在閱讀的文本文件。將'data_file = StringIO（output）'放在哪裏'output =「output.txt」'是否會出錯？我對實際輸入的內容感到困惑，對不起！ – user8290579

如果列值來自不同的文件，如何將列插入數據框？

回答

相關問題