如何最好地去除文件中的某些字符串？

如果我有以下內容的文件：如何最好地去除文件中的某些字符串？

11:17 GET this is my content #2013 
11:18 GET this is my content #2014 
11:19 GET this is my content #2015

我如何使用'串線的Scanner而忽略某些部分= scanner.nextLine（）;?

，我喜歡有會結果：

this is my content 
this is my content 
this is my content

所以我想從一開始旅行的一切，直到搞定，然後把一切都要等到＃字符。

這怎麼能輕鬆完成？

來源

2013-10-01 membersound

您可以使用String.indexOf(String str)和String.indexOf(char ch)方法。例如：

String line = scanner.nextLine(); 
int start = line.indexOf("GET"); 
int end = line.indexOf('#'); 
String result = line.substring(start + 4, end);

來源

2013-10-01 09:28:30

一種方式是

String strippedStart = scanner.nextLine().split(" ", 3)[2]; 
String result = strippedStart.substring(0, strippedStart.lastIndexOf("#")).trim();

這是假設總是在一開始兩個用空格分隔的記號（11:22 GET或POST 11:33，IDK）。

來源

2013-10-01 09:28:38 kenor

你可以做這樣的事情： -

String line ="11:17 GET this is my content #2013"; 
int startIndex = line.indexOf("GET "); 
int endIndex = line.indexOf("#"); 
line = line.substring(startIndex+4, endIndex-1); 
System.out.println(line);

來源

2013-10-01 09:30:14 SudoRahul

在我看來你的問題的最佳解決方案將使用Java regex。使用正則表達式，您可以定義您想要檢索的文本組或文本組，以及哪種文本出現在哪裏。我很長一段時間都沒有和Java一起工作過，所以我會盡力幫助你擺脫困境。我會盡力給你一個正確的方向。

首先，編譯模式：正則表達式的

Pattern pattern = Pattern.compile("^\d{1,2}:\d{1,2} GET (.*?) #\d+$", Pattern.MULTILINE);

第一部分說，你期待一個或兩個數字，後跟一個冒號後面跟着一個或兩個數字一次。之後，GET（如果您期望這些單詞，您可以使用GET | POST，如果您期望任何單詞，可以使用\ w +？）。然後你用括號定義你想要的組。最後，你把散列和任何數字的數字至少一個數字。你可能會考慮把標誌DOTALL和CASE_INSENSITIVE，雖然我不認爲你會需要它們。

然後你繼續匹配：

Matcher matcher = pattern.matcher(textToParse); 
while (matcher.find()) 
{ 
    //extract groups here 
    String group = matcher.group(1); 
}

在while循環，可以使用matcher.group(1)找到你用括號選中（您希望提取的文本）組中的文本。 matcher.group(0)給出了整個發現，這不是你目前正在尋找的（我猜）。

對不起，在代碼中的任何錯誤，它還沒有經過測試。希望這會讓你走上正軌。

來源

2013-10-01 09:51:09

你可以試試這個，而靈活的解決方案：

Scanner s = new Scanner(new File("data")); 
Pattern p = Pattern.compile("^(.+?)\\s+(.+?)\\s+(.*)\\s+(.+?)$"); 
Matcher m; 
while (s.hasNextLine()) { 
    m = p.matcher(s.nextLine()); 
    if (m.find()) { 
     System.out.println(m.group(3)); 
    } 
}

這段代碼忽略首先，從每一行第二個和最後一個字打印前。

優點是它依賴於空格而不是特定的字符串文字來執行剝離。

來源

2013-10-01 09:53:16

如何最好地去除文件中的某些字符串？

回答

相關問題