2016-12-04 75 views
-2

我有以下的數據集(這是一個示例):費舍爾耶茨洗牌在python

ID  Sub1 Sub2 Sub3 Sub4 
Creb3l1 10.14 9.67 10.14 10.42 
Chchd6 11.25 10.74 10.80 11.07 
Arih1 9.91 9.25 10.20 9.34 
Prpf8 11.54 11.58 11.14 11.36 
Rfng 11.71 11.56 10.81 10.72 
Rnf114 12.66 12.60 12.59 12.56 

我要進行的費雪耶茨對這個數據交叉設置10倍(即寫10個輸出文件,每一個使用Fisher Yates shuffle進行一次數據隨機化)。

我寫這個代碼:

import sys 
import itertools 
from itertools import permutations 

for line in open(sys.argv[1]).readlines()[2:]: 
    line = line.strip().split() 
    ID = line[0] 
    expression_values = line[1:] 
    for shuffle in permutations(expression_values): 
     print shuffle 

此代碼的輸出是這樣的(樣品):

('11.25', '10.74', '10.80', '11.07') 
('11.25', '10.74', '11.07', '10.80') 
('11.25', '10.80', '10.74', '11.07') 
('11.25', '10.80', '11.07', '10.74') 
('11.25', '11.07', '10.74', '10.80') 
('11.25', '11.07', '10.80', '10.74') 
('10.74', '11.25', '10.80', '11.07') 
('10.74', '11.25', '11.07', '10.80') 
('10.74', '10.80', '11.25', '11.07') 
('10.74', '10.80', '11.07', '11.25') 
('10.74', '11.07', '11.25', '10.80') 
('10.74', '11.07', '10.80', '11.25') 
('10.80', '11.25', '10.74', '11.07') 
('10.80', '11.25', '11.07', '10.74') 
('10.80', '10.74', '11.25', '11.07') 
('10.80', '10.74', '11.07', '11.25') 
('10.80', '11.07', '11.25', '10.74') 
('10.80', '11.07', '10.74', '11.25') 
('11.07', '11.25', '10.74', '10.80') 
('11.07', '11.25', '10.80', '10.74') 
('11.07', '10.74', '11.25', '10.80') 
('11.07', '10.74', '10.80', '11.25') 
('11.07', '10.80', '11.25', '10.74') 
('11.07', '10.80', '10.74', '11.25') 
('9.91', '9.25', '10.20', '9.34') 
('9.91', '9.25', '9.34', '10.20') 

,我有麻煩正在產生的隨機化數據的塊的特定部分(例如給我一組7條Fisher-Yates隨機線,我可以寫入文件)。如果有人能告訴我如何編輯上面的代碼來生成10個輸出文件,每個文件包含7行文本(即與輸入文件相同的編號),每個文件都帶有一個隨機化的Fisher Yates混洗值集合,我將不勝感激它。

編輯1:我已經嘗試了幾種不同的方式: 例如下面的代碼:

for line in open(sys.argv[1]).readlines()[2:]: 
    line = line.strip().split() 
    gene_name = line[0] 
    expression_values = line[1:] 
    RandomList = [] 
    for shuffle in permutations(expression_values): 
     while len(RandomList) <10:                                         
      RandomList.append(shuffle)                                        
    print RandomList                                             

我以爲會給我回每行10個randomisations。它給我回同樣的隨機線,10倍,每行:

[('11.25', '10.74', '10.80', '11.07'), ('11.25', '10.74', '10.80', '11.07'), ('11.25', '10.74', '10.80', '11.07'), ('11.25', '10.74', '10.80', '11.07'), ('11.25', '10.74', '10.80', '11.07'), ('11.25', '10.74', '10.80', '11.07'), ('11.25', '10.74', '10.80', '11.07'), ('11.25', '10.74', '10.80', '11.07'), ('11.25', '10.74', '10.80', '11.07'), ('11.25', '10.74', '10.80', '11.07')] 
[('9.91', '9.25', '10.20', '9.34'), ('9.91', '9.25', '10.20', '9.34'), ('9.91', '9.25', '10.20', '9.34'), ('9.91', '9.25', '10.20', '9.34'), ('9.91', '9.25', '10.20', '9.34'), ('9.91', '9.25', '10.20', '9.34'), ('9.91', '9.25', '10.20', '9.34'), ('9.91', '9.25', '10.20', '9.34'), ('9.91', '9.25', '10.20', '9.34'), ('9.91', '9.25', '10.20', '9.34')] 
[('11.54', '11.58', '11.14', '11.36'), ('11.54', '11.58', '11.14', '11.36'), ('11.54', '11.58', '11.14', '11.36'), ('11.54', '11.58', '11.14', '11.36'), ('11.54', '11.58', '11.14', '11.36'), ('11.54', '11.58', '11.14', '11.36'), ('11.54', '11.58', '11.14', '11.36'), ('11.54', '11.58', '11.14', '11.36'), ('11.54', '11.58', '11.14', '11.36'), ('11.54', '11.58', '11.14', '11.36')] 
[('11.71', '11.56', '10.81', '10.72'), ('11.71', '11.56', '10.81', '10.72'), ('11.71', '11.56', '10.81', '10.72'), ('11.71', '11.56', '10.81', '10.72'), ('11.71', '11.56', '10.81', '10.72'), ('11.71', '11.56', '10.81', '10.72'), ('11.71', '11.56', '10.81', '10.72'), ('11.71', '11.56', '10.81', '10.72'), ('11.71', '11.56', '10.81', '10.72'), ('11.71', '11.56', '10.81', '10.72')] 
[('12.66', '12.60', '12.59', '12.56'), ('12.66', '12.60', '12.59', '12.56'), ('12.66', '12.60', '12.59', '12.56'), ('12.66', '12.60', '12.59', '12.56'), ('12.66', '12.60', '12.59', '12.56'), ('12.66', '12.60', '12.59', '12.56'), ('12.66', '12.60', '12.59', '12.56'), ('12.66', '12.60', '12.59', '12.56'), ('12.66', '12.60', '12.59', '12.56'), ('12.66', '12.60', '12.59', '12.56')] 

編輯2:肖恩:非常感謝你的幫助,所以我確實知道如何寫入文件一般,例如我可以說:

for i in range(10): 
    output_file = "random." + str(i) 
    open_output_file = open(output_file, 'a') 
    ***for each line of the randomised array***: 
     open_output_file.write(line + "\n") 
    open_output_file.close() 

我有寫文件的問題是,我甚至不能得到我想要打印到屏幕首先,例如,如果我運行這段代碼是什麼:

import sys 
    import itertools 
    from itertools import permutations 

    for i in range(10): 
     for line in open(sys.argv[1]).readlines()[2:]: 
      line = line.strip().split() 
      gene_name = line[0] 
      expression_values = line[1:] 
      for shuffle in permutations(expression_values): 
       print shuffle[:6] 
      print "***" 
    i +=1 

我會希望輸出是7條隨機線,接着是「***」,然後是7條隨機線,10次。但是它會打印每行的所有組合。

+0

你被困在哪一部分?獲得七個小組?將它們寫入文件?所有這些東西都有答案。 – jonrsharpe

+0

謝謝,我編輯了這個問題。是的,我得到的輸出是120行打印到屏幕/寫入文件。我很困惑如何獲得7人組,例如每次打印一行7行,寫入文件(然後執行10次)。 – user1288515

+0

你有什麼嘗試?製作一份清單,也許?在達到適當的長度時行動?如果你已經做出努力,展示它。如果你還沒有,就製作一個!或者只是[做一些研究](http://stackoverflow.com/questions/3992735/python-generator-that-groups-another-iterable-into-groups-of-n)。 – jonrsharpe

回答

-1

「包含7行文本的每個文件」

聽起來像是你想要做的陣列切片。

a = [ 1, 2, 3, 4, 5, 6 ] 
a[:3] 

將產生1, 2, 3

陣列切片被索引的起始索引,結束索引完成,並跳過。在a[:3]起始索引被跳過,以便它在0開始元件3

a[1:3]將產生[2, 3]

a[1:5:2]將在1開始,結束於5,跳過2。因此,這將產生[2, 4]

所以,在你的榜樣,它看起來像你想要寫shuffle[:6]

至於寫文件,你需要一些類型的循環

,因爲我在範圍(0,10): 文件名= 「輸出 - %s.txt」 %i個

這將產生的文件名輸出0.txt,輸出的1.txt等

https://docs.python.org/2/tutorial/inputoutput.html約文件輸入/產量。基本上你應該使用with關鍵字和open

with open(filename, 'w') as f: 
    f.write(str(shuffle[:7])) 

這應該讓你在正確的方向

0

我想我有一個解決辦法:

import sys 
import itertools 
from itertools import permutations 
import os 

#Write the header line to 10 random files 
fileopen = open(sys.argv[1]).readlines() 
for i in range(10): 
    file_name = "random" + str(i) + ".txt" 
    open_file_name = open(file_name, 'a') 
    open_file_name.write(fileopen[0].strip() + "\n") 

#Write the rest of the info to 10 random files 
for line in fileopen: 
    if "Sub" not in line: 
      line = line.strip().split() 
      ID = line[0] 
      expression_values = line[1:] 
      ListOfShuffles = permutations(expression_values) 
      for ind,i in enumerate(list(ListOfShuffles)[0:10]): 
       file_name = "random" + str(ind) + ".txt" 
       open_file_name = open(file_name, 'a') 
       open_file_name.write(ID + "\t" + "\t".join(i) + "\n")