2010-07-01 68 views
2

我是編程新手,所以請耐心等待。我有一個看起來像這樣許多XML文檔:用Ruby和Nokogiri處理XML文件

文件名:PRIDE_Exp_Complete_Ac_10094.xml.gz

<ExperimentCollection version="2.1"> 
<Experiment> 
    <ExperimentAccession>1015</ExperimentAccession> 
    <Title>Protein complexes in Saccharomyces cerevisiae (GPM06600002310)</Title> 
    <ShortLabel>GPM06600002310</ShortLabel> 
    <Protocol> 
     <ProtocolName>None</ProtocolName> 
    </Protocol> 
    <mzData version="1.05" accessionNumber="1015"> 
     <cvLookup cvLabel="RESID" fullName="RESID Database of Protein Modifications" version="0.0" address="http://www.ebi.ac.uk/RESID/" /> 
     <cvLookup cvLabel="UNIMOD" fullName="UNIMOD Protein Modifications for Mass Spectrometry" version="0.0" address="http://www.unimod.org/" /> 
     <description> 
      <admin> 
       <sampleName>GPM06600002310</sampleName> 
       <sampleDescription comment="Ho, Y., et al., Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry. Nature. 2002 Jan 10;415(6868):180-3."> 
        <cvParam cvLabel="NEWT" accession="4932" name="Saccharomyces cerevisiae (Baker's yeast)" value="Saccharomyces cerevisiae" /> 
       </sampleDescription> 
          </admin> 
     </description> 
     <spectrumList count="0" /> 
    </mzData> 
     </Experiment> 

我要拿出在 「標題」, 「ProtocolName」 之間的文本,和「SampleName」並保存到與.xml.gz具有相同名稱的文本文件中。我至今(基於職位我在這個網站看到)下面的代碼,但它似乎沒有工作:

require 'rubygems' 
require 'nokogiri' 
doc = Nokogiri::XML(File.open("PRIDE_Exp_Complete_Ac_10094.xml.gz")) 
@ExperimentCollection = doc.css("ExperimentCollection Title").map {|node| node.children.text } 

有人能幫助我嗎?

感謝

+2

關於'請刪除我'留下您的問題的地方(在我回滾到問題之前),您可以通過使用問題文本下面的鏈接刪除您自己的問題(你應該看到類似於:'edit | close | delete'),如果你想刪除它,你可以自由地這樣做,因爲**你擁有這個問題。我回滾了它,因爲它似乎是合法的,值得回答。如果您已經解決了您的問題,請發佈您的解決方案。否則,請花時間讓人們看到它,並提供幫助。 – 2010-07-01 17:50:15

回答

0

如果你很高興與REXML,而且也只有一個<Experiment>每個文件,然後像下面應該幫助...(順便說一下,上面的文字是無效的XML,因爲沒有結束<ExperimentCollection>標籤)

require "rexml/document" 
include REXML 
xml=<<EOD 
<Experiment> 
    <ExperimentAccession>1015</ExperimentAccession> 
    <Title>Protein complexes in Saccharomyces cerevisiae (GPM06600002310)</Title> 
    <ShortLabel>GPM06600002310</ShortLabel> 
    <Protocol> 
     <ProtocolName>None</ProtocolName> 
    </Protocol> 
    <mzData version="1.05" accessionNumber="1015"> 
     <cvLookup cvLabel="RESID" fullName="RESID Database of Protein Modifications" version="0.0" address="http://www.ebi.ac.uk/RESID/" /> 
     <cvLookup cvLabel="UNIMOD" fullName="UNIMOD Protein Modifications for Mass Spectrometry" version="0.0" address="http://www.unimod.org/" /> 
     <description> 
      <admin> 
       <sampleName>GPM06600002310</sampleName> 
       <sampleDescription comment="Ho, Y., et al., Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry. Nature. 2002 Jan 10;415(6868):180-3."> 
        <cvParam cvLabel="NEWT" accession="4932" name="Saccharomyces cerevisiae (Baker's yeast)" value="Saccharomyces cerevisiae" /> 
       </sampleDescription> 
          </admin> 
     </description> 
     <spectrumList count="0" /> 
    </mzData> 
     </Experiment> 
EOD 

doc = Document.new xml 
doc.elements["Experiment/Title"].text 
doc.elements["Experiment/Protocol/ProtocolName"].text 
doc.elements["Experiment/mzData/description/admin/sampleName"].text