因此,我爲學校創建了一個正則表達式解析器,該解析器創建負責匹配的對象的層次結構。我決定做到面向對象,因爲我更容易想象這種語法的實現。所以,這些是我組成正則表達式的類。這些都是用Java編寫的,但是我認爲如果你精通任何面向對象的語言,你可以繼續學習。我們需要實現的唯一操作符是Union(+),Kleene-Star(*),表達式的連接(ab或者可能是(a + b)c),當然括括號中的例子。這就是我現在實施的方式,而且我的工作方式就像在主體中有一點額外開銷的魅力一樣。使用多態對正則表達式解析器建模
父類,Regexp.java
public abstract class Regexp {
//Print out the regular expression it's holding
//Used for debugging purposes
abstract public void print();
//Checks if the string matches the expression it's holding
abstract public Boolean match(String text);
//Adds a regular expression to be operated upon by the operators
abstract public void add(Regexp regexp);
/*
*To help the main with the overhead to help it decide which regexp will
*hold the other
*/
abstract public Boolean isEmpty();
}
有最簡單的正則表達式,Base.java,持有一個字符,如果字符串的字符匹配返回true。
public class Base extends Regexp{
char c;
public Base(char c){
this.c = c;
}
public Base(){
c = null;
}
@Override
public void print() {
System.out.println(c);
}
//If the string is the char, return true
@Override
public Boolean match(String text) {
if(text.length() > 1) return false;
return text.startsWith(""+c);
}
//Not utilized, since base is only contained and cannot contain
@Override
public void add(Regexp regexp) {
}
@Override
public Boolean isEmpty() {
return c == null;
}
}
一個圓括號Paren.java,在裏面放置一個正則表達式。這裏沒有什麼特別的想法,但說明了匹配的工作原理。
public class Paren extends Regexp{
//Member variables: What it's holding and if it's holding something
private Regexp regexp;
Boolean empty;
//Parenthesis starts out empty
public Paren(){
empty = true;
}
//Unless you create it with something to hold
public Paren(Regexp regexp){
this.regexp = regexp;
empty = false;
}
//Print out what it's holding
@Override
public void print() {
regexp.print();
}
//Real simple; either what you're holding matches the string or it doesn't
@Override
public Boolean match(String text) {
return regexp.match(text);
}
//Pass something for it to hold, then it's not empty
@Override
public void add(Regexp regexp) {
this.regexp = regexp;
empty = false;
}
//Return if it's holding something
@Override
public Boolean isEmpty() {
return empty;
}
}
Union.java,它是兩個可匹配的正則表達式。如果其中一個匹配,整個聯盟就是一場比賽。
public class Union extends Regexp{
//Members
Regexp lhs;
Regexp rhs;
//Indicating if there's room to push more stuff in
private Boolean lhsEmpty;
private Boolean rhsEmpty;
public Union(){
lhsEmpty = true;
rhsEmpty = true;
}
//Can start out with something on the left side
public Union(Regexp lhs){
this.lhs = lhs;
lhsEmpty = false;
rhsEmpty = true;
}
//Or with both members set
public Union(Regexp lhs, Regexp rhs) {
this.lhs = lhs;
this.rhs = rhs;
lhsEmpty = false;
rhsEmpty = false;
}
//Some stuff to help me see the unions format when I'm debugging
@Override
public void print() {
System.out.println("(");
lhs.print();
System.out.println("union");
rhs.print();
System.out.println(")");
}
//If the string matches the left side or right side, it's a match
@Override
public Boolean match(String text) {
if(lhs.match(text) || rhs.match(text)) return true;
return false;
}
/*
*If the left side is not set, add the member there first
*If not, and right side is empty, add the member there
*If they're both full, merge it with the right side
*(This is a consequence of left-to-right parsing)
*/
@Override
public void add(Regexp regexp) {
if(lhsEmpty){
lhs = regexp;
lhsEmpty = false;
}else if(rhsEmpty){
rhs = regexp;
rhsEmpty = false;
}else{
rhs.add(regexp);
}
}
//If it's not full, it's empty
@Override
public Boolean isEmpty() {
return (lhsEmpty || rhsEmpty);
}
}
甲級聯,Concat.java,這基本上是鏈接在一起正則表達式的清單。這個很複雜。
public class Concat extends Regexp{
/*
*The list of regexps is called product and the
*regexps inside called factors
*/
List<Regexp> product;
public Concat(){
product = new ArrayList<Regexp>();
}
public Concat(Regexp regexp){
product = new ArrayList<Regexp>();
pushRegexp(regexp);
}
public Concat(List<Regexp> product) {
this.product = product;
}
//Adding a new regexp pushes it into the list
public void pushRegexp(Regexp regexp){
product.add(regexp);
}
//Loops over and prints them
@Override
public void print() {
for(Regexp factor: product){
factor.print();
}
}
/*
*Builds up a substring approaching the input string.
*When it matches, it builds another substring from where it
*stopped. If the entire string has been pushed, it checks if
*there's an equal amount of matches and factors.
*/
@Override
public Boolean match(String text) {
ArrayList<Boolean> bools = new ArrayList<Boolean>();
int start = 0;
ListIterator<Regexp> itr = product.listIterator();
Regexp factor = itr.next();
for(int i = 0; i <= text.length(); i++){
String test = text.substring(start, i);
if(factor.match(test)){
start = i;
bools.add(true);
if(itr.hasNext())
factor = itr.next();
}
}
return (allTrue(bools) && (start == text.length()));
}
private Boolean allTrue(List<Boolean> bools){
return product.size() == bools.size();
}
@Override
public void add(Regexp regexp) {
pushRegexp(regexp);
}
@Override
public Boolean isEmpty() {
return product.isEmpty();
}
}
再一次,我已經得到了這些工作讓我滿意我的開銷,標記化和所有那些好東西。現在我想介紹一下Kleene-star的操作。它匹配文本中出現的任何數字,甚至是0。所以,ba *會匹配b,ba,baa,baaa等,而(ba)*會匹配ba,baba,bababa等。它甚至看起來可能擴展我的正則表達式到這個,或者你看到解決這個問題的另一種方式? PS:有一些getters,setter和其他所有我沒有寫出來的支持函數,但這主要是爲了讓你快速瞭解這些類如何工作。
讀者:Russ Cox的散文很漂亮。 – 2015-02-08 18:44:28
謝謝!這非常有幫助! – pimmen 2015-02-08 19:30:45
我已經通過使用你的方法工作,但它非常緩慢並且隨着每一層呈指數級增長(例如,a(b(a + b))有兩層括號),並且我已將它交給並通過了它,但如果這是一個分級任務,我不會使用這個非常慢的算法。五分鐘需要一分鐘,七分鐘需要半小時。我會給這篇文章一看,也許會實現一個更好的。 – pimmen 2015-02-10 13:34:04