2017-04-03 39 views
0

我要解析這個包:正則表達式的聯接線的相關信息

WGS AUFFUELLUNGEN 
ADMIN1   23.03. 
17:09 -20- 1500.00 
17:10 JD20 560.00 
17:11 -2.0- 112.00 
ADMIN1   24.03. 
14:51 JD50 500.00 
ADMIN2   27.03. 
08:58 JD50 500.00 
---------------------- 
       3172.00 

解析用戶和日期很簡單:

\r?\n(.*)\s+(\d\d\.\d\d\.) 

解析時間,面額和數量也很簡單:

\r?\n(\d\d:\d\d)\s+(.*)\s+(\d+\.\d\d) 

但我需要一個解析,一次檢測用戶,日期,時間,面額和金額爲每個預訂。

任何想法?

回答

0

您將需要某種形式的中間結構,你可以遍歷。如果你不能改變你的java代碼,也許你可以使用正則表達式首先匹配你的示例字符串的整個塊。在第二步中,您匹配所有的細節。

public class RegexTestCase { 

private static final String PACKAGE 
    = "WGS AUFFUELLUNGEN  \n" + 
    "ADMIN1   23.03.\n" + 
    "17:09 -20- 1500.00\n" + 
    "17:10 JD20 560.00\n" + 
    "17:11 -2.0- 112.00\n" + 
    "ADMIN1   24.03.\n" + 
    "14:51 JD50 500.00\n" + 
    "ADMIN2   27.03.\n" + 
    "08:58 JD50 500.00\n" + 
    "----------------------\n" + 
    "    3172.00\n"; 

private static final String NL = "\\r?\\n"; 

private static final String USER_DATE_REGEX 
= "(.*?)\\s+(\\d\\d\\.\\d\\d\\.)"; 

private static final String TIME_AMOUNT_REGEX 
= "(\\d\\d:\\d\\d)\\s+(.*?)\\s+(\\d+\\.\\d\\d)"; 

private static final String BLOCK_REGEX 
    = USER_DATE_REGEX + NL + "((" + TIME_AMOUNT_REGEX + NL + ")+)"; 


@Test 
public void testRegex() throws Exception { 
    Pattern blockPattern = Pattern.compile(BLOCK_REGEX); 
    Pattern timeAmountPattern = Pattern.compile(TIME_AMOUNT_REGEX); 

    int count = 0; 
    Matcher blockMatcher = blockPattern.matcher(PACKAGE); 
    while (blockMatcher.find()) { 
     String name = blockMatcher.group(1); 
     String date = blockMatcher.group(2); 
     String block = blockMatcher.group(3); 

     Matcher timeAmountMatcher = timeAmountPattern.matcher(block); 
     while (timeAmountMatcher.find()) { 
      String time = timeAmountMatcher.group(1); 
      String denom = timeAmountMatcher.group(2); 
      String amount = timeAmountMatcher.group(3); 

      assertEquals("wrong name", RESULTS[count].name, name); 
      assertEquals("wrong date", RESULTS[count].date, date); 
      assertEquals("wrong time", RESULTS[count].time, time); 
      assertEquals("wrong denom", RESULTS[count].denom, denom); 
      assertEquals("wrong amount", RESULTS[count].amount, amount); 
      count++; 
     } 
    } 
    assertEquals("wrong number of results", 5, count); 
} 

private static final Result[] RESULTS 
= { new Result("ADMIN1", "23.03.", "17:09", "-20-", "1500.00") 
    , new Result("ADMIN1", "23.03.", "17:10", "JD20", "560.00") 
    , new Result("ADMIN1", "23.03.", "17:11", "-2.0-", "112.00") 
    , new Result("ADMIN1", "24.03.", "14:51", "JD50", "500.00") 
    , new Result("ADMIN2", "27.03.", "08:58", "JD50", "500.00") 
    }; 

static final class Result { 
    private final String name; 
    private final String date; 
    private final String time; 
    private final String denom; 
    private final String amount; 
    Result(String name, String date, String time, String denom, String amount) { 
     this.name = name; 
     this.date = date; 
     this.time = time; 
     this.denom = denom; 
     this.amount = amount; 
    } 
} 
} 
+0

是的,情況就是這樣。整個區塊(從標題到總和)已經被解析出大約50K的文本。現在解析細節是一項挑戰 - 將每個預訂的用戶,日期,時間,面額,金額與一個表達式結合在一起。 – quero59

0

你的第二個正則表達式太渴望了,看看this

我建議把它變成\r?\n(\d\d:\d\d)\s+(.*?)\s+(\d+.\d\d)

This regex會立即匹配用戶,日期,時間,爲每一位預約的名稱和金額,但我已經添加了多行的正則表達式標誌:

(^(.*)\s+(\d\d\.\d\d\.)$|^(\d\d:\d\d)\s+(.*)\s+(\d+\.\d\d)$)+ 
+0

THX freedev,你的表達並不在我們的Java工具或在線工具,如https://regex101.com/ 工作,目前,我試圖瞭解更多關於你提到的多選項... – quero59

+0

在我的文章中,我剛剛在https://regex101.com/r/yVTa5y/3 – freedev

+0

上添加了一個工作示例。很抱歉,我錯過了設置正則表達式的選項。不過,我需要輸出格式: 組1總會用戶 組2總會日期 組3總會時間 等 – quero59

0
  1. 分割整個字符串由新線
  2. 遍歷每一行和

    a. look for username and date by regex1, if matches then extract userName and Date 
        b. if regex1 doesn't, then look for time, denomincation and amount regex2 . if it matches 
        then extract time, denomination and amount from this. 
    
    
    final String userRegex = "^(\\w+)\\s+(\\d+\\.\\d+\\.)$"; 
    final String timeRegex = "^(\\d+:\\d+)\\s+([\\S]+)\\s+(\\d+\\.?\\d+)$"; 
    

樣品來源:

public static void main(String[] args) { 
    final String userRegex = "^(\\w+)\\s+(\\d+\\.\\d+\\.)$"; 
    final String timeRegex = "^(\\d+:\\d+)\\s+([\\S]+)\\s+(\\d+\\.?\\d+)$"; 

    final String string = "WGS AUFFUELLUNGEN\n" 
      + "ADMIN1   23.03.\n" 
      + "17:09 -20- 1500.00\n" 
      + "17:10 JD20 560.00\n" 
      + "17:11 -2.0- 112.00\n" 
      + "ADMIN1   24.03.\n" 
      + "14:51 JD50 500.00\n" 
      + "ADMIN2   27.03.\n" 
      + "08:58 JD50 500.00\n" 
      + "----------------------\n" 
      + "    3172.00\n"; 


    String[] list = string.split("\n"); 
    Matcher m; 
    int cnt=1; 
    for (String s : list) { 
     m=Pattern.compile(userRegex).matcher(s); 
     if (m.matches()) { 

      System.out.println("##### List "+cnt+" ######"); 
      System.out.println("User Name:"+m.group(1)); 
      System.out.println("Date :"+m.group(2)); 
      cnt++; 
     } 
     else 
     { 
      m=Pattern.compile(timeRegex).matcher(s); 
      if(m.matches()) 
      { 
       System.out.println("Time :"+m.group(1)); 
       System.out.println("Denomination :"+m.group(2)); 
       System.out.println("Amount :"+m.group(3)); 
       System.out.println("---------------------"); 
      } 
     } 
    } 
} 
+0

Thx Rizwan。 Unfornately我無法編碼任何東西。我需要一個解決所有預訂的表達方式。我必須用這個表達式來提供一個java工具,它有一個修復代碼。 – quero59

+0

這就足夠了這樣的格式的任何數據,因此不固定。此外,你不能通過一個單一的正則表達式在java中按照你的要求去獲取每個單獨的數據! –