2009-12-23 44 views
1

我有多個字符串,在以下格式:
12/18/2009 2時08分26秒考上母鹿,約翰(卡#111)在南大廳[在]

從這些字符串中我需要拿出日期,時間,人的名字和姓氏以及卡號。被承認的詞可以省略,卡號後面的數字可以忽略。
我有一種感覺,我想爲此使用StringTokenizer,但我不積極。
有什麼建議嗎?分手的String在Java中

+0

如果這是你從讀文件,我會受到誘惑,處理它,並將它保存在第二個文件,說CSV格式,這更容易處理。這是因爲場相關信息可以包含空格。或者改變它被編碼的方式。 – 2009-12-23 13:49:43

回答

2

你的記錄格式是很簡單,我只用字符串的split方法獲取的日期和時間。正如在評論中指出的那樣,使用可以包含空格的名稱使事情複雜化,使得按空格分割記錄不適用於每個字段。我用正則表達式來獲取其他三條信息。

public static void main(String[] args) { 
    String record1 = "12/18/2009 02:08:26 Admitted Doe, John (Card #111) at South Lobby [In]"; 
    String record2 = "12/18/2009 02:08:26 Admitted Van Halen, Eddie (Card #222) at South Lobby [In]"; 
    String record3 = "12/18/2009 02:08:26 Admitted Thoreau, Henry David (Card #333) at South Lobby [In]"; 

    summary(record1); 
    summary(record2); 
    summary(record3); 
} 

public static void summary(String record) { 
    String[] tokens = record.split(" "); 

    String date = tokens[0]; 
    String time = tokens[1]; 

    String regEx = "Admitted (.*), (.*) \\(Card #(.*)\\)"; 
    Pattern pattern = Pattern.compile(regEx); 
    Matcher matcher = pattern.matcher(record); 
    matcher.find(); 

    String lastName = matcher.group(1); 
    String firstName = matcher.group(2); 
    String cardNumber = matcher.group(3); 

    System.out.println("\nDate: " + date); 
    System.out.println("Time: " + time); 
    System.out.println("First Name: " + firstName); 
    System.out.println("Last Name: " + lastName); 
    System.out.println("Card Number: " + cardNumber); 
} 

正則表達式"Admitted (.*), (.*) \\(Card #(.*)\\)"使用分組括號來存儲你想提取信息。您記錄中存在的括號必須轉義。

運行上面的代碼給我下面的輸出:

Date: 12/18/2009 
Time: 02:08:26 
First Name: John 
Last Name: Doe 
Card Number: 111 

Date: 12/18/2009 
Time: 02:08:26 
First Name: Eddie 
Last Name: Van Halen 
Card Number: 222 

Date: 12/18/2009 
Time: 02:08:26 
First Name: Henry David 
Last Name: Thoreau 
Card Number: 333 
+2

不錯,但是這個名稱中包含空格。例如「Van Halen,Eddie」 – 2009-12-23 09:43:45

+0

@Adriaan:謝謝你指出。真實世界的數據有時候是很痛苦的! :)我將我的代碼更改爲使用正則表達式來提取受空間名稱影響的數據。 – 2009-12-23 15:57:30

+0

謝謝比爾。這工作完美。 – clang1234 2009-12-24 06:38:39

-1

相信你的膽量... :) 隨着StringTokenizer類:

import java.io.*; 
import java.util.StringTokenizer; 
public class Test { 
    public Test() { 
    }

public void execute(String str) { String date, time, firstName, lastName, cardNo; StringTokenizer st = new StringTokenizer(str, " "); date = st.nextToken(); time = st.nextToken(); st.nextToken(); //Admitted lastName = st.nextToken(",").trim(); firstName = st.nextToken(",(").trim(); st.nextToken("#"); //Card cardNo = st.nextToken(")#"); System.out.println("date = " + date +"\ntime = " + time +"\nfirstName = " + firstName +"\nlastName = "+ lastName +"\ncardNo = " +cardNo); }

public static void main(String args[]) { Test t = new Test(); String record1 = "12/18/2009 02:08:26 Admitted Doe, John (Card #111) at South Lobby [In]"; String record2 = "12/18/2009 02:08:26 Admitted Van Halen, Eddie (Card #222) at South Lobby [In]"; String record3 = "12/18/2009 02:08:26 Admitted Thoreau, Henry David (Card #333) at South Lobby [In]"; t.execute(record1); t.execute(record2); t.execute(record3); } }

+0

謝謝,但使用StringTokenizer,我會如何打破字符串? – clang1234 2009-12-23 05:54:38

+0

我用相同的方式編輯了答案:) – bhups 2009-12-24 06:35:02

3

的字符串標記是偉大的,當你有一個常用的分隔符,但在這種情況下,我會選擇對正則表達式。

+1

正則表達式的+1。 – Ross 2009-12-23 06:01:33

+0

所以作爲從字符串中提取日期的一個例子,我想要以下內容: 模式datePattern =模式。編譯(「[0-9] {2}/[0-9] {2}/[0-9] {4}」); 然後使用字符串上的匹配器,使用該模式,我不會得到任何結果。我將如何正確地格式化這個正則表達式? – clang1234 2009-12-23 07:26:04

+0

反覆試驗:http://www.regexplanet.com/simple/ – 2009-12-23 16:25:00

2

我會去尋找java.util.Scanner ...這段代碼會讓你開始......你應該真的使用掃描器方法的模式形式,而不是我使用的字符串形式。

import java.util.Scanner; 

public class Main 
{ 
    public static void main(String[] args) 
     throws Exception 
    { 
     final String str; 
     final Scanner scanner; 
     final String date; 
     final String time; 
     final String word; 
     final String lastName; 
     final String firstName; 

     str  = "12/18/2009 02:08:26 Admitted Doe, John (Card #111) at South Lobby [In]"; 
     scanner = new Scanner(str); 
     date  = scanner.next("\\d+/\\d+/\\d+"); 
     time  = scanner.next("\\d+:\\d+:\\d+"); 
     word  = scanner.next(); 
     lastName = scanner.next(); 
     firstName = scanner.next(); 
     System.out.println("date : " + date); 
     System.out.println("time : " + time); 
     System.out.println("word : " + word); 
     System.out.println("last : " + lastName); 
     System.out.println("first: " + firstName); 
    } 
} 
1

有幾件事情要記住,而你正在分析這一行:

  • 姓氏可以有空格,所以你應該尋找
  • 名字可以有一個空間,所以尋找

由於這個原因,我會從TofuBeer的答案中解脫出來,並調整下一個名字和姓氏。由於多餘的空格,字符串拆分會變得雜亂無章。

0

最短的正則表達式溶液(用壓鑄類):

String stringToParse = "12/18/2009 02:08:26 Admitted Doe, John (Card #111) at South Lobby [In] "; 
Pattern pattern = Pattern.compile("((\\d{2}/){2}\\d{4}\\s(\\d{2}:){2}\\d{2})\\s(\\w+)\\s((.*)),\\s((.*))\\s.*#(\\d+)"); 
Matcher matcher = pattern.matcher(stringToParse); 
matcher.find(); 

String firstName = matcher.group(6); 
String lastName = matcher.group(5); 
int cardNumber = Integer.parseInt(matcher.group(7)); 

DateFormat df = new SimpleDateFormat("MM/dd/yyyy HH:mm:ss"); 
Date date = df.parse(matcher.group(1));