根據維克托的要求,我會發表另一個答案。在這種情況下,CharSequence
被實現爲另一個CharSequence
的包裝。由於Matcher
實例請求將字符CountingCharSequence
報告給偵聽器接口。
這樣做有點危險,因爲CharSequence.toString()
方法返回一個真實的String
實例,這個實例不能被監控。另一方面,目前的實施看起來似乎相對簡單,並且工作起來。 toString()
被調用,但似乎是在找到匹配項時填充組。儘管如此,最好寫一些單元測試。
哦,因爲我必須手動打印「100%」標記,所以可能會出現舍入錯誤或偏離錯誤。快樂調試:P
public class RegExProgress {
// the org. LinkScanner provided by Victor
public static class LinkScanner {
private static final Pattern hrefPattern = Pattern.compile("<a\\b[^>]*href=\"(.*?)\".*?>(.*?)</a>");
public Collection<String> scan(CharSequence html) {
ArrayList<String> links = new ArrayList<>();
Matcher hrefMatcher = hrefPattern.matcher(html);
while (hrefMatcher.find()) {
String link = hrefMatcher.group(1);
links.add(link);
}
return links;
}
}
interface ProgressListener {
void listen(int characterOffset);
}
static class SyncedProgressListener implements ProgressListener {
private final int size;
private final double blockSize;
private final double percentageOfBlock;
private int block;
public SyncedProgressListener(int max, int blocks) {
this.size = max;
this.blockSize = (double) size/(double) blocks - 0.000_001d;
this.percentageOfBlock = (double) size/blockSize;
this.block = 0;
print();
}
public synchronized void listen(int characterOffset) {
if (characterOffset >= blockSize * (block + 1)) {
this.block = (int) ((double) characterOffset/blockSize);
print();
}
}
private void print() {
System.out.printf("%d%%%n", (int) (block * percentageOfBlock));
}
}
static class CountingCharSequence implements CharSequence {
private final CharSequence wrapped;
private final int start;
private final int end;
private ProgressListener progressListener;
public CountingCharSequence(CharSequence wrapped, ProgressListener progressListener) {
this.wrapped = wrapped;
this.progressListener = progressListener;
this.start = 0;
this.end = wrapped.length();
}
public CountingCharSequence(CharSequence wrapped, int start, int end, ProgressListener pl) {
this.wrapped = wrapped;
this.progressListener = pl;
this.start = start;
this.end = end;
}
@Override
public CharSequence subSequence(int start, int end) {
// this may not be needed, as charAt() has to be called eventually
System.out.printf("subSequence(%d, %d)%n", start, end);
int newStart = this.start + start;
int newEnd = this.start + end - start;
progressListener.listen(newStart);
return new CountingCharSequence(wrapped, newStart, newEnd, progressListener);
}
@Override
public int length() {
System.out.printf("length(): %d%n", end - start);
return end - start;
}
@Override
public char charAt(int index) {
//System.out.printf("charAt(%d)%n", index);
int realIndex = start + index;
progressListener.listen(realIndex);
return this.wrapped.charAt(realIndex);
}
@Override
public String toString() {
System.out.printf(" >>> toString() <<< %n", start, end);
return wrapped.toString();
}
}
public static void main(String[] args) throws Exception {
LinkScanner scanner = new LinkScanner();
String content = new String(Files.readAllBytes(Paths.get("regex - Java - How to measure a Matcher processing - Stack Overflow.htm")));
SyncedProgressListener pl = new SyncedProgressListener(content.length(), 10);
CountingCharSequence ccs = new CountingCharSequence(content, pl);
Collection<String> urls = scanner.scan(ccs);
// OK, I admit, this is because of an off-by one error
System.out.printf("100%% - %d%n", urls.size());
}
}
如果你想要一個非常側面思考的想法:實現一個CharSequence接口並檢查從它請求哪些字符來檢查進度。不確定它可以乾淨地完成,如果任何人調用'toString'就可能失去蹤跡。如果可以做到這將是我的首選解決方案。 –
好的,實現了這一點,但我不確定它是否足夠好,稍後可能會添加其他答案,經過一番思考。 –
@MaartenBodewes很高興看到一個例子..當然,如果你有時間......我看不出我在這種情況下用'CharSequence'做什麼......雖然你給了一個想法,知道在Mather正在處理的html字符的哪一部分。有一個方法'hrefMatcher.end()''返回前一匹配的結束索引。這個知道HTML的整個大小(可以通過一個簡單的'html.length();'調用知道..我認爲這可能是一個不準確但便宜的解決方案 – Victor