數組中值比較的pythonic方法？

輸入是一個製表符分隔的文件。行是變量，列是樣本。變量可以假定三個值（00,0,11,11），並按照需要修飾的順序列出（v1-> vN）。有大量的行和列，所以輸入文件需要以塊的形式讀取。

輸入如下：
```
s1 s2 s3 s4 
v1 00 00 11 01 
v2 00 00 00 00 
v3 01 11 00 00 
v4 00 00 00 00 
(...) 
```
我所試圖做的是輸入拆分成幾排，這裏的作品是足夠的樣品是每一個獨特的只是大的碎片。在上面的例子中，從v1開始，第一個塊應該在v3處結束，因爲在該點有足夠的信息表明樣本是唯一的。下一個塊將從第4版開始並重復該過程。任務在達到最後一行時結束。塊應打印在輸出文件中。

我嘗試：

我試圖做的是使用CSV模塊以產生由列表組成的陣列，每個含有單一變量的狀態（00,01,00）適用於所有樣品。或者，通過旋轉輸入，爲每個變量創建包含樣本狀態的列表。如果最好使用v1 = ['00'，'00'，'11'，'01']或者s1 = ['00'，我會問這個工作應該專注於列還是行，，即。，'00'，'01'，'00'，...]

以下代碼是指我試圖將列問題更改爲行問題的旋轉操作。（對不起，我笨拙的Python語法，是我能做的最好的）

my_infilename='my_file.txt' 
csv_infile=csv.reader(open(my_infilename,'r'), delimiter='\t') 
out=open('transposed_'+my_infilename, 'w') 
csv_infile=zip(*csv_infile) 
line_n=0 
for line in csv_infile: 
line_n+=1 
    if line_n==1: #headers 
     continue 
    else: 
     line=(','.join(line)+'\n') #just to make it readable to me 
     out.write(line) 
out.close()

什麼是解決這個問題的最好方法是什麼？ pivoting可以有任何幫助嗎？有沒有我可以依賴的內置函數？

來源

2012-04-17 cometarossa

那麼，您的問題是什麼？ – Marcin 2012-04-17 15:32:49

基本上，如何用一個循環放下腳本，這將允許我識別區分每個樣本所需的最小數量的有序變量。 – cometarossa 2012-04-17 15:48:59

這與您發佈的代碼有什麼關係？這個問題的哪一部分是有問題的？現在，這篇文章聽起來像「我試圖解決這個問題，但不能，那麼你能做到嗎？」 – Marcin 2012-04-17 15:52:20

假設你得到進口作爲都是相同長度列表的列表CSV數據，請問這是怎麼對你的工作......

def get_block(data_rows): 
    samples = [] 

    for cell in data_rows[0]: 
     samples.append('') 

    # add one row at a time to each sample and see if all are unique 
    for row_index, row in enumerate(data_rows): 
     for cell_index, cell in enumerate(row): 
      samples[cell_index] = '%s%s' % (samples[cell_index], cell) 

     are_all_unique = True 
     sample_dict = {} # use dictionary keys to find repeats 
     for sample in samples: 
      if sample_dict.get(sample): 
       # already there, so another row needed 
       are_all_unique = False 
       break 
      sample_dict[sample] = True # add the key to the dictionary 
     if are_all_unique: 
      return True, row_index 

    return False, None 

def get_all_blocks(all_rows): 
    remaining_rows = all_rows[:] # make a copy  
    blocks = [] 

    while True: 
     found_block, block_end_index = get_block(remaining_rows) 
     if found_block: 
      blocks.append(remaining_rows[:block_end_index+1]) 
      remaining_rows = remaining_rows[block_end_index+1:] 
      if not remaining_rows: 
       break 
     else: 
      blocks.append(remaining_rows[:]) 
      break 

    return blocks 


if __name__ == "__main__": 
    v1 = ['00', '00', '11', '01'] 
    v2 = ['00', '00', '00', '00'] 
    v3 = ['01', '11', '00', '00'] 
    v4 = ['00', '00', '00', '00'] 

    all_rows = [v1, v2, v3, v4] 

    blocks = get_all_blocks(all_rows) 

    for index, block in enumerate(blocks): 
     print "This is block %s." % index 
     for row in block: 
      print row 
     print

======== =========

這是塊0

[ '00'， '00'， '11'， '01']

[ '00'，'00 '，'00'，'00']

[ '01'， '11'， '00'， '00']

這是塊1。

['00'，'00'，'00'，'00']

來源

2012-04-17 17:59:23 jcfollower

它看起來正是我要找的東西。感謝您的幫助，我會盡快通知您。 – cometarossa 2012-04-17 21:34:01

作品奇蹟。我會給你一個投票，但是我的名聲太低了！ – cometarossa 2012-04-18 09:12:35

我完全不理解你的問題（「協調變量」？「單一地確定樣本」？），但我知道你正在使用csv模塊並且你的縮進也是不正確的。

我不確切知道你輸入的文件是什麼樣的，但假設它是製表符分隔的，下面的（未經測試的）腳本顯示了你從輸入文件中獲取塊的方法，將它們轉換並重寫爲你的輸出文件。

import csv 

# this is not strictly necessary, but you can define a custom dialect for input and output 

class SampleDialect (csv.Dialect): 
    delimiter = "\t" 
    quoting = csv.QUOTE_NONE  

sampledialect = SampleDialect() 

ifn = 'my_file.txt' 
ofn = 'transposed_'+ifn 

ifp = open(ifn, 'rb') 
ofp = open(ofn, 'wb') 

incsv = csv.reader(ifp, dialect=sampledialect) 
outcsv = csv.writer(ofp, dialect=sampledialect) 


header = None 
block = [] 
for lineno, samples in enumerate(incsv): 
    if lineno==0: #header 
     header = samples 
     continue 
    block.append(samples) 
    if lineno%3: 
     # end of block 
     # do something with block 
     # then write it out 
     outcsv.writerows(block) 
     block = [] 

ifp.close() 
ofp.close()

來源

2012-04-17 17:21:54

謝謝。我會按照這個方法工作。 – cometarossa 2012-04-17 21:39:29

塊的大小不盡相同。他希望每個塊的每列都有足夠的行，以便與塊中的其他樣本保持一致。 – agf 2012-04-18 14:22:22

我更多地展示了csv讀者和作者的使用。他將不得不將'if lineno％3：'行改爲任何他的狀況。 – 2012-04-18 15:35:45

數組中值比較的pythonic方法？

回答

相關問題