2009-11-20 71 views
0

可能重複:
How to parse this output and separate each field/word如何用分隔符分析文本?

我想分析下列數據,這樣我得到的輸出如下規定。

輸入:

 
RTRV-ALM-EQPT::ALL:RA01; 

    SIMULATOR 09-11-20 13:52:15 
M RA01 COMPLD 
    "SLOT-1-1-1,CMP:MN,T-FANCURRENT-1-HIGH,NSA,01-10-09,00-00-00,,:\"Fan-T\"," 
    "SLOT-1-1-1,CMP:MJ,T-BATTERYPWR-2-LOW,NSA,01-10-09,00-00-00,,:\"Battery-T\"," 
    "SLOT-1-1-2,CMP:CR,PROC_FAIL,SA,09-11-20,13-51-55,,:\"Processor Failure\"," 
    "SLOT-1-1-3,OLC:MN,T-LASERCURR-1-HIGH,SA, 01-10-07,13-21-03,,:\"Laser-T\"," 
    "SLOT-1-1-3,OLC:MJ,T-LASERCURR-2-LOW,NSA, 01-10-02,21-32-11,,:\" Laser-T\"," 
    "SLOT-1-1-4,OLC:MN,T-LASERCURR-1-HIGH,SA,01-10-05,02-14-03,,:\"Laser-T\"," 
    "SLOT-1-1-4,OLC:MJ,T-LASERCURR-2-LOW,NSA,01-10-04,01-03-02,,:\"Laser-T\"," 
; 

輸出:

 
1) RTRV-ALM-EQPT::ALL:RA01; 
2) SIMULATOR 
3) 09-11-20 
4) 13:52:15 
5) M 
6) RA01 
7) COMPLD 
8) "SLOT-1-1-1,CMP:MN,T-FANCURRENT-1-HIGH,NSA,01-10-09,00-00-00,,:\"Fan-T\"," 
9) "SLOT-1-1-1,CMP:MJ,T-BATTERYPWR-2-LOW,NSA,01-10-09,00-00-00,,:\"Battery-T\"," 
10) "SLOT-1-1-2,CMP:CR,PROC_FAIL,SA,09-11-20,13-51-55,,:\"Processor Failure\"," 
11) "SLOT-1-1-3,OLC:MN,T-LASERCURR-1-HIGH,SA, 01-10-07,13-21-03,,:\"Laser-T\"," 
12) "SLOT-1-1-3,OLC:MJ,T-LASERCURR-2-LOW,NSA, 01-10-02,21-32-11,,:\" Laser-T\"," 
13) "SLOT-1-1-4,OLC:MN,T-LASERCURR-1-HIGH,SA,01-10-05,02-14-03,,:\"Laser-T\"," 
14) "SLOT-1-1-4,OLC:MJ,T-LASERCURR-2-LOW,NSA,01-10-04,01-03-02,,:\"Laser-T\"," 
+0

因此,您是否在可能包含\ escped引號的空格中引用了空格? – 2009-11-20 10:59:48

+0

我認爲這是第三次問這個問題。這是一種家庭作業還是什麼? – 2009-11-20 11:00:13

回答

0

爲了解析任意輸入你要知道它的結構。

  1. 前四條線總是存在嗎?
  2. 這四行中每一行的格式是什麼?
1

最好的方法可能不是將第一個文本轉換爲第二個文本。

相反,首先將第一個文本解析爲一組代表它們實際是什麼的Java對象。例如,輸入的第二行/第三行可能由具有「area」,「day」和「time」屬性的Test類表示。 (只有你可以根據你對什麼意思的瞭解來想出一個合理的模型)。

然後,一旦獲得了文件信息的良好內存中表示形式,您可以考慮將文本輸出爲第二種情況。現在應該很容易從Java對象中打印出各種字段和屬性,而不是試圖在輸入文本上進行實時轉換。

1

假設文件相對較小,因此可以讀入內存。嘗試是這樣的:

public class Main { 
    public static void main(String[] args) { 
     String text = "RTRV-ALM-EQPT::ALL:RA01;\n"+ 
      "\n"+ 
      " SIMULATOR 09-11-20 13:52:15\n"+ 
      "M RA01 COMPLD\n"+ 
      " \"SLOT-1-1-1,CMP:MN,T-FANCURRENT-1-HIGH,NSA,01-10-09,00-00-00,,:\\\"Fan-T\\\",\"\n"+ 
      " \"SLOT-1-1-1,CMP:MJ,T-BATTERYPWR-2-LOW,NSA,01-10-09,00-00-00,,:\\\"Battery-T\\\",\"\n"+ 
      " \"SLOT-1-1-2,CMP:CR,PROC_FAIL,SA,09-11-20,13-51-55,,:\\\"Processor Failure\\\",\"\n"+ 
      " \"SLOT-1-1-3,OLC:MN,T-LASERCURR-1-HIGH,SA, 01-10-07,13-21-03,,:\\\"Laser-T\\\",\"\n"+ 
      " \"SLOT-1-1-3,OLC:MJ,T-LASERCURR-2-LOW,NSA, 01-10-02,21-32-11,,:\\\" Laser-T\\\",\"\n"+ 
      " \"SLOT-1-1-4,OLC:MN,T-LASERCURR-1-HIGH,SA,01-10-05,02-14-03,,:\\\"Laser-T\\\",\"\n"+ 
      " \"SLOT-1-1-4,OLC:MJ,T-LASERCURR-2-LOW,NSA,01-10-04,01-03-02,,:\\\"Laser-T\\\",\"\n"+ 
      ";"; 
     Matcher m = Pattern.compile("\"(?:\\\\.|[^\\\"])*\"|\\S+").matcher(text); 
     int n = 0; 
     while(m.find()) { 
      System.out.println((++n)+") "+m.group()); 
     } 
    } 
} 

輸出:

1) RTRV-ALM-EQPT::ALL:RA01; 
2) SIMULATOR 
3) 09-11-20 
4) 13:52:15 
5) M 
6) RA01 
7) COMPLD 
8) "SLOT-1-1-1,CMP:MN,T-FANCURRENT-1-HIGH,NSA,01-10-09,00-00-00,,:\"Fan-T\"," 
9) "SLOT-1-1-1,CMP:MJ,T-BATTERYPWR-2-LOW,NSA,01-10-09,00-00-00,,:\"Battery-T\"," 
10) "SLOT-1-1-2,CMP:CR,PROC_FAIL,SA,09-11-20,13-51-55,,:\"Processor Failure\"," 
11) "SLOT-1-1-3,OLC:MN,T-LASERCURR-1-HIGH,SA, 01-10-07,13-21-03,,:\"Laser-T\"," 
12) "SLOT-1-1-3,OLC:MJ,T-LASERCURR-2-LOW,NSA, 01-10-02,21-32-11,,:\" Laser-T\"," 
13) "SLOT-1-1-4,OLC:MN,T-LASERCURR-1-HIGH,SA,01-10-05,02-14-03,,:\"Laser-T\"," 
14) "SLOT-1-1-4,OLC:MJ,T-LASERCURR-2-LOW,NSA,01-10-04,01-03-02,,:\"Laser-T\"," 
15) ; 

唯一的區別是,有一個15搭配:;,你忘了,我相信。

原始的正則表達式(沒有所有的逃逸)看起來是這樣的:

"(?:\\.|[^\\"])*"|\S+ 

和火柴:

"   # match a double quote 
(?:  # open non matching group 1 
    \\.  # match a backslash followed by any char (except line breaks) 
    |  # OR 
    [^\\"] # match any char except a backslash and a double quote 
)*   # close non matching group 1 and repeat it zero or more times 
"   # match a double quote 
|   # OR 
\S+  # match one or more characters other than white space chars 

換句話說:匹配帶引號的字符串或匹配,但僅以一個字非空格字符

+0

很好的回答:) – 2009-11-20 11:42:28

+0

謝謝安德烈亞斯。 – 2009-11-20 12:28:41

相關問題