2012-04-16 111 views
1

我對Python有點熟悉。我有一個需要以特定方式閱讀的信息文件。下面是一個例子...將文件內容讀入數組

1 
6 
0.714285714286 
0 0 1.00000000000 
0 1 0.61356352337 
... 
-1 -1 0.00000000000 
0 0 5.13787636499 
0 1 0.97147643932 
... 
-1 -1 0.00000000000 
0 0 5.13787636499 
0 1 0.97147643932 
... 
-1 -1 0.00000000000 
0 0 0 0 5.13787636499 
0 0 0 1 0.97147643932 
.... 

所以每個文件都會有這樣的結構(製表符分隔)。

  • 第一行必須作爲變量以及第二行和第三行讀入。
  • 接下來我們有四個由-1 -1 0.0000000000分開的代碼塊。每個代碼塊都是'n'行。前兩個數字表示行中第三個數字要插入數組中的位置/位置。只列出了唯一的位置(因此,位置0 1將與1 0相同,但該信息不會顯示)。
  • 注意:第四個代碼塊有一個4索引號。

我需要

  • 前3行中讀入使用數字作爲第一2(或4)列作爲唯一變量
  • 每個數據塊讀入一個數組什麼數組索引和第三列作爲插入到數組中的值。
  • 只顯示唯一的數組元素。我需要鏡像位置填充適當的值(0 1值也應該出現在1 0)。
  • 最後一個塊需要插入到一個4維數組中。
+4

你已經試過了什麼? – yannis 2012-04-16 13:07:06

+0

我沒有嘗試過任何東西(缺乏Python的經驗),因此我在SE上的帖子。 – LordStryker 2012-04-17 14:28:23

回答

3

我重寫了代碼。現在它幾乎是你需要的。你只需要微調。

我決定離開舊的答案 - 也許這也會有幫助。 因爲新功能夠豐富,有時可能不清楚明白。

def the_function(filename): 
    """ 
    returns tuple of list of independent values and list of sparsed arrays as dicts 
    e.g. ([1,2,0.5], [{(0.0):1,(0,1):2},...]) 
    on fail prints the reason and returns None: 
    e.g. 'failed on text.txt: invalid literal for int() with base 10: '0.0', line: 5' 
    """ 

    # open file and read content 
    try: 
     with open(filename, "r") as f: 
      data_txt = [line.split() for line in f] 
    # no such file 
    except IOError, e: 
     print 'fail on open ' + str(e) 

    # try to get the first 3 variables 
    try: 
     vars =[int(data_txt[0][0]), int(data_txt[1][0]), float(data_txt[2][0])] 
    except ValueError,e: 
     print 'failed on '+filename+': '+str(e)+', somewhere on lines 1-3' 
     return 

    # now get arrays 
    arrays =[dict()] 
    for lineidx, item in enumerate(data_txt[3:]): 
     try: 
      # for 2d array data 
      if len(item) == 3: 
       i, j = map(int, item[:2]) 
       val = float(item[-1]) 
       # check for 'block separator' 
       if (i,j,val) == (-1,-1,0.0): 
        # make new array 
        arrays.append(dict()) 
       else: 
        # update last, existing 
        arrays[-1][(i,j)] = val 
      # almost the same for 4d array data 
      if len(item) == 5: 
       i, j, k, m = map(int, item[:4]) 
       val = float(item[-1]) 
       arrays[-1][(i,j,k,m)] = val 
     # if value is unparsable like '0.00' for int or 'text' 
     except ValueError,e: 
      print 'failed on '+filename+': '+str(e)+', line: '+str(lineidx+3) 
      return 
    return vars, arrays 
+0

不可思議。我正在調整這個代碼。 – LordStryker 2012-04-20 16:43:16

+0

我可以將float映射到數組中的位置(i,j),但不能(j,i)。我嘗試在'if/else語句中插入'array [-1] [(j,i)] = val',但是我的數組的大小根本不增加(21個元素,而不是所需的42)。有什麼想法嗎? – LordStryker 2012-04-23 15:36:51

+0

奇怪。這應該工作。你檢查了錯字嗎? i = 0,j = 0的情況? – akaRem 2012-04-23 19:53:20

1

從文件中讀取行迭代,你可以使用類似:

with open(filename, "r") as f: 
    var1 = int(f.next()) 
    var2 = int(f.next()) 
    var3 = float(f.next()) 
    for line in f: 
    do some stuff particular to the line we are on... 

只需創建環路以外的一些數據結構,並在上面的循環填充。爲了字符串分割成元素,你可以使用:

>>> "spam ham".split() 
['spam', 'ham'] 

我也想你想看看在numpy庫陣列數據結構,並儘可能SciPy庫進行分析。

+3

更好地使用'與開放(文件名,「r」)作爲f:'並把語句放在'與'塊 – jamylak 2012-04-16 13:17:07

+0

編輯答案,我認爲主要優點是'close'不需要被調用文件連接。 – 2012-04-16 13:20:01

+2

重複打開(文件名「r」)是否有編輯錯誤? – Levon 2012-04-16 13:24:27

2

正如我anderstand是什麼?你問..

# read data from file into list 
parsed=[] 
with open(filename, "r") as f: 
    for line in f: 
     # # you can exclude separator here with such code (uncomment) (1) 
     # # be careful one zero more, one zero less and it wouldn work 
     # if line == '-1 -1 0.00000000000': 
     #  continue 
     parsed.append(line.split()) 

# a simpler version 
with open(filename, "r") as f: 
    # # you can exclude separator here with such code (uncomment, replace) (2) 
    # parsed = [line.split() for line in f if line != '-1 -1 0.00000000000'] 
    parsed = [line.split() for line in f] 

# at this point 'parsed' is a list of lists of strings. 
# [['1'],['6'],['0.714285714286'],['0', '0', '1.00000000000'],['0', '1', '0.61356352337'] .. ] 

# ALT 1 ------------------------------- 
# we do know the len of each data block 

# get the first 3 lines: 
head = parsed[:3] 

# get the body: 
body = parsed[3:-2] 

# get the last 2 lines: 
tail = parsed[-2:] 

# now you can do anything you want with your data 
# but remember to convert str to int or float 

# first3 as unique: 
unique0 = int(head[0][0]) 
unique1 = int(head[1][0]) 
unique2 = float(head[2][0]) 

# cast body: 
# check each item of body has 3 inner items 
is_correct = all(map(lambda item: len(item)==3, body)) 
# parse str and cast 
if is_correct: 
    for i, j, v in body: 
     # # you can exclude separator here (uncomment) (3) 
     # # * 1. is the same as float(1) 
     # if (i,j,v) == (0,0,1.): 
     #  # here we skip iteration for line w/ '-1 -1 0.0...' 
     #  # but you can place another code that will be executed 
     #  # at the point where block-termination lines appear 
     #  continue 

     some_body_cast_function(int(i), int(j), float(v)) 
else: 
    raise Exception('incorrect body') 


# cast tail 
# check each item of body has 5 inner items 
is_correct = all(map(lambda item: len(item)==5, tail)) 
# parse str and cast 
if is_correct: 
    for i, j, k, m, v in body: # 'l' is bad index, because similar to 1. 
     some_tail_cast_function(int(i), int(j), int(k), int(m), float(v)) 
else: 
    raise Exception('incorrect tail') 

# ALT 2 ----------------------------------- 
# we do NOT know the len of each data block 

# maybe we have some array? 
array = dict() # your array may be other type 

v1,v2,v2 = parsed[:3] 
unique0 = int(v1[0]) 
unique1 = int(v2[0]) 
unique2 = float(v3[0]) 

for item in parsed[3:]: 
    if len(item) == 3: 
     i,j,v = item 
     i = int(i) 
     j = int(j) 
     v = float(v) 

     # # yo can exclude separator here (uncomment) (4) 
     # # * 1. is the same as float(1) 
     # # logic is the same as in 3rd variant 
     # if (i,j,v) == (0,0,1.): 
     #  continue 

     # do your stuff 
     # for example, 
     array[(i,j)]=v 
     array[(j,i)]=v 

    elif len(item) ==5: 
     i, j, k, m, v = item 
     i = int(i) 
     j = int(j) 
     k = int(k) 
     m = int(m) 
     v = float(v) 

     # do your stuff 

    else: 
     raise Exception('unsupported') # or, maybe just 'pass' 
+0

這幾乎正是我所需要的。我忘了明確提到'-1 -1 0.00000'行只是塊終止行(當迭代到達-1的值時...結束當前數組並開始新行)。我想我可以調整你的例子來獲得我需要的東西。當然,任何幫助總是受歡迎的。 – LordStryker 2012-04-17 14:27:48

+0

添加了一些代碼插入(4種變體),您可以在其中排除「塊終止行」或根據需要處理它們。希望你喜歡它! – akaRem 2012-04-18 10:30:22

+0

我很感謝你的繼續幫助。我無法告訴程序在每次達到-1指示符時創建一個新數組,然後用下面的代碼塊填充該數組。現在它將具有長度爲3個元素的所有代碼塊轉儲到一個數組中。 – LordStryker 2012-04-18 20:31:51