子進程：無法將'_io.BufferedReader'對象隱式轉換爲str

我正在編寫一個腳本，它是snakemake和python代碼的組合，可以自動化大量文件。更確切地說，我正致力於使用BWA MEM將讀取與成對末端讀取（http://bio-bwa.sourceforge.net/bwa.shtml）進行對齊。在腳本的第一部分，我重複了我的文件中的名稱列表（它們是fastq bunzipped文件），然後將它們相應地排序在列表中。下面是一些文件的快速瀏覽：子進程：無法將'_io.BufferedReader'對象隱式轉換爲str

['NG-8653_1A_lib95899_4332_7_1', 'NG-8653_1A_lib95899_4332_7_2', 'NG-8653_1B_lib95900_4332_7_1', 'NG-8653_1B_lib95900_4332_7_2', 'NG-8653_1N_lib95898_4332_7_1', 'NG-8653_1N_lib95898_4332_7_2']

正如你所看到的，讀的排序由二兩（1A _... 1和1A ..._ 2等）。現在使用子進程，我想通過使用bunzip2對它們進行解壓縮，然後通過標準輸入將它們傳遞給bwa mem。 bwa mem命令將fastq格式文件轉換爲.sam文件，然後我使用samtools將它們轉換爲.bam格式。這裏的腳本到目前爲止：

import re, os, subprocess, bz2 

WDIR = "/home/alaa/Documents/snakemake" 
workdir: WDIR 
SAMPLESDIR = "/home/alaa/Documents/snakemake/fastq/" 
REF = "/home/alaa/Documents/inputs/reference/hg19_ref_genome.fa" 

FILE_FASTQ = glob_wildcards("fastq/{samples}.fastq.bz2") 
LIST_FILE_SAMPLES = [] 

for x in FILE_FASTQ[0]: 
    LIST_FILE_SAMPLES.append(x) 

LIST_FILE_SAMPLES = sorted(LIST_FILE_SAMPLES) 
print(LIST_FILE_SAMPLES) 

rule fastq_to_bam: 
    run: 
     for x in range(0, len(LIST_FILE_SAMPLES), 2): 
      # get the name of the sample (1A, 1B ...) 
      samp = "" 
      samp += LIST_FILE_SAMPLES[x].split("_")[1] 

      # get the corresponding read (1 or 2) 
      r1 = SAMPLESDIR + LIST_FILE_SAMPLES[x] + ".fastq.bz2" 
      r2 = SAMPLESDIR + LIST_FILE_SAMPLES[x+1] + ".fastq.bz2" 

      # gunzipping the files and pipping them 
      p1 = subprocess.Popen(['bunzip2', '-kc', r1], stdout=subprocess.PIPE) 
      p2 = subprocess.Popen(['bunzip2', '-kc', r2], stdout=subprocess.PIPE)   


      # now write the output file to .bam format after aligning them 
      with open("sam/" + samp + ".bam", "w") as stdout: 
       fastq2sam = subprocess.Popen(["bwa", "mem", "-T 1", REF, p1.stdout, p2.stdout], stdout=subprocess.PIPE) 
       fastq2samOutput = subprocess.Popen(["samtools", "view", "-Sb", "-"], shell = True, stdin=fastq2sam.stdout, stdout=stdout)

我試圖通過逐行嘗試調試腳本。將bunzip2寫入輸出文件時，它工作正常。現在，如果我試圖管它，我得到一個錯誤：

Error in job fastq_to_bam while creating output file . 
RuleException: 
TypeError in line 39 of /home/alaa/Documents/snakemake/Snakefile: 
Can't convert '_io.BufferedReader' object to str implicitly 
    File "/home/alaa/Documents/snakemake/Snakefile", line 39, in __rule_fastq_to_bam 
    File "/usr/lib/python3.5/subprocess.py", line 947, in __init__ 
    File "/usr/lib/python3.5/subprocess.py", line 1490, in _execute_child 
    File "/usr/lib/python3.5/concurrent/futures/thread.py", line 55, in run 
Exiting because a job execution failed. Look above for error message 
Will exit after finishing currently running jobs. 
Exiting because a job execution failed. Look above for error message

你能告訴我什麼是腳本的問題？自從今天上午我試圖尋找這個問題，我似乎無法弄清楚。任何幫助深表感謝。提前致謝。

編輯1：

從@bli和@Johannes閱讀更多有關反饋後，我做了這麼遠：

import re, os, subprocess, bz2, multiprocessing 
from os.path import join 
from contextlib import closing 

WDIR = "/home/alaa/Documents/snakemake" 
workdir: WDIR 
SAMPLESDIR = "fastq/" 
REF = "/home/alaa/Documents/inputs/reference/hg19_ref_genome.fa" 


FILE_FASTQ = glob_wildcards("fastq/{samples, NG-8653_\d+[a-zA-Z]+_.+}") 
LIST_FILE_SAMPLES = [] 

for x in FILE_FASTQ[0]: 
    LIST_FILE_SAMPLES.append("_".join(x.split("_")[0:5])) 

LIST_FILE_SAMPLES = sorted(LIST_FILE_SAMPLES) 
print(LIST_FILE_SAMPLES) 


rule final: 
    input: 
     expand('bam/' + '{sample}.bam', sample = LIST_FILE_SAMPLES) 

rule bunzip_fastq: 
    input: 
     r1 = SAMPLESDIR + '{sample}_1.fastq.bz2', 
     r2 = SAMPLESDIR + '{sample}_2.fastq.bz2' 
    output: 
     o1 = SAMPLESDIR + '{sample}_r1.fastq.gz', 
     o2 = SAMPLESDIR + '{sample}_r2.fastq.gz' 
    shell: 
     """ 
     bunzip2 -kc < {input.r1} | gzip -c > {output.o1} 
     bunzip2 -kc < {input.r2} | gzip -c > {output.o2} 
     """ 

rule fastq_to_bam: 
    input: 
     r1 = SAMPLESDIR + '{sample}_r1.fastq.gz', 
     r2 = SAMPLESDIR + '{sample}_r2.fastq.gz', 
     ref = REF 
    output: 
     'bam/' + '{sample}.bam' 
    shell: 
     """ 
     bwa mem {input.ref} {input.r1} {input.r2} | samtools -b > {output} 
     """

感謝很多的幫助！我想我可以從這裏管理。

最好的問候，阿拉

來源

2017-04-10 Zen

我第二次JohannesKöster在他的回答中發表了評論。你可能會考慮爲bunzip做一個單獨的規則，在這個規則中你可以使用「shell」部分，而不必用'subprocess'手動運行。然後，將此規則的輸出作爲映射規則的輸入（並刪除該循環並改用通配符）。 – bli

你的問題是在這裏：

["bwa", "mem", "-T 1", REF, p1.stdout, p2.stdout]

p1.stdout和p2.stdout是BufferedReader類型，但subprocess.Popen預計字符串列表。您可能想要使用的是p1.stdout.read()。

但是，請注意，您的方法不是使用Snakemake的慣用方式，實際上，腳本中沒有任何內容真正使用Snakemake的功能。

對於Snakemake，你寧願有一條規則，它會處理帶有bwa mem的單個示例，以fastq作爲輸入並將bam存儲爲輸出。請參閱Snakemake官方教程中的this example。它完全符合你在這裏完成的任務，但是用更少的必要樣板。只需讓Snakemake完成這項工作，不要試圖自己重新實現它。

來源

2017-04-11 08:41:51

謝謝您的回覆Johannes。您的評論是有幫助的。而我所發佈的只是劇本的開始。我完全理解Snakemake的工作原理，因爲我有進一步的下游規則！再次感謝您的幫助。我仍在研究該腳本，並在完成後發佈我的解決方法。 – Zen

子進程：無法將'_io.BufferedReader'對象隱式轉換爲str

回答

相關問題