Python：如何從文本文件中提取字符串以用作數據

這是我第一次編寫python腳本，並且我在入門時遇到了一些問題。假設我有一個名爲Test.txt的包含此信息的txt文件。Python：如何從文本文件中提取字符串以用作數據

        x   y   z  Type of atom 
ATOM 1  C1 GLN D 10  26.395  3.904  4.923 C 
ATOM 2  O1 GLN D 10  26.431  2.638  5.002 O 
ATOM 3  O2 GLN D 10  26.085  4.471  3.796 O 
ATOM 4  C2 GLN D 10  26.642  4.743  6.148 C

我想要做的是最終編寫一個腳本，它將找到這三個原子的質心。所以基本上我想總結一下txt文件中的所有x值，每個數字乘以給定的值，這取決於原子的類型。

我知道我需要爲每個x值定義位置，但是我很難弄清楚如何使這些x值表示爲數字而不是字符串中的txt。我必須記住，我需要將這些數字乘以原子類型，所以我需要一種方法來爲每種原子類型定義它們。任何人都可以把我推向正確的方向嗎？

來源

2012-07-27 Cammen

首先，這是一個功課嗎？ – FallenAngel 2012-07-27 15:30:39

歡迎來到SO！你能向我們展示迄今爲止的代碼嗎？如果你有讀取文件的代碼並獲得'x'值作爲字符串，那麼這是一個很好的開始！基本上，如果你告訴我們你有什麼，我們可以幫助你改進它，並讓它達到你可以使用它的地步。 – mgilson 2012-07-27 15:30:43

這是從您的軟件作爲製表符分隔文件？如果是這樣，你可以看看http://docs.python.org/library/csv.html – Jzl5325 2012-07-27 15:30:49

基本上在Python中使用open函數可以打開任何文件。所以你可以做如下事情：---下面的片段不是整個問題的解決方案，而是一種方法。

def read_file(): 
    f = open("filename", 'r') 
    for line in f: 
     line_list = line.split() 
     .... 
     .... 
    f.close()

從這一點上，你可以很好地設置你可以用這些值做什麼。基本上第二行只是打開文件供閱讀。第三行定義了一個for循環，該循環一次讀取一行文件，每行寫入line變量。

該代碼段中的最後一行基本上將字符串 - 在每個whitepsace中 - 分解成一個列表。所以line_list [0]將會是你第一列的值等等。從這一點來說，如果你有任何編程經驗，你可以使用if語句等來獲得你想要的邏輯。

**請記住，存儲在該列表中的值的類型將全部爲字符串，因此如果您想執行任何算術運算（如添加），您必須非常小心。

*被修改的語法校正

來源

2012-07-27 15:37:55

您應該使用術語'list'而不是'array'。此外，你永遠不會調用'f.close（）'（注意，這種事情正是'with'語句被設計爲更容易處理的東西）。 – mgilson 2012-07-27 15:39:19

@mgilson你是對的。請參閱我的編輯。 – 2012-07-27 15:43:58

我不是'擁有'的粉絲，但你應該熟悉它。然而，從不使用'file.close（）'的建議是不好的..很多時候最好這樣處理它。 – ely 2012-07-27 15:49:41

mass_dictionary = {'C':12.0107, 
        'O':15.999 
        #Others...? 
        } 

# If your files are this structured, you can just 
# hardcode some column assumptions. 
coords_idxs = [6,7,8] 
type_idx = 9 

# Open file, get lines, close file. 
# Probably prudent to add try-except here for bad file names. 
f_open = open("Test.txt",'r') 
lines = f_open.readlines() 
f_open.close() 

# Initialize an array to hold needed intermediate data. 
output_coms = []; total_mass = 0.0; 

# Loop through the lines of the file. 
for line in lines: 

    # Split the line on white space. 
    line_stuff = line.split() 

    # If the line is empty or fails to start with 'ATOM', skip it. 
    if (not line_stuff) or (not line_stuff[0]=='ATOM'): 
     pass 

    # Otherwise, append the mass-weighted coordinates to a list and increment total mass. 
    else: 
     output_coms.append([mass_dictionary[line_stuff[type_idx]]*float(line_stuff[i]) for i in coords_idxs]) 
     total_mass = total_mass + mass_dictionary[line_stuff[type_idx]] 

# After getting all the data, finish off the averages. 
avg_x, avg_y, avg_z = tuple(map(lambda x: (1.0/total_mass)*sum(x), [[elem[i] for elem in output_coms] for i in [0,1,2]])) 


# A lot of this will be better with NumPy arrays if you'll be using this often or on 
# larger files. Python Pandas might be an even better option if you want to just 
# store the file data and play with it in Python.

來源

2012-07-27 15:48:25 ely

'line_stuff = line.replace（「\ n」，「」）。split（）' - 這相當於'line_stuff.split（）'。 – mgilson 2012-07-27 15:50:47

當我使用'split（）'時，我經常會在我的東西后面出現「\ n」。我認爲這取決於行格式是否有效，我只是總覺得包括謹慎。 – ely 2012-07-27 15:51:37

你使用'split（''）'？這可能會導致尾隨的換行符，但不是'split（）' – mgilson 2012-07-27 15:54:33

如果已安裝pandas，檢出read_fwf函數輸入一個固定的寬度的文件，並創建一個數據幀（2-d的表格數據結構）。它可以在導入時節省您的代碼行，並且如果您想進行任何額外的數據操作，還可以爲您提供大量的數據管理功能。

來源

2012-07-28 04:53:36

Python：如何從文本文件中提取字符串以用作數據

回答

相關問題