2008-10-29 64 views
11

我有一系列包含混合編號的文本(即:整個部分和小數部分)。的問題是,該文本是全人編碼草率的:正則表達式匹配草率部分/混合編號

  1. 整個部分可以或可以不存在(例如:「10」)
  2. 小數部分可以或可以不存在(例如:「 1/3" )
  3. 這兩個部分可通過空間和/或連字符(例如被分離: 「10 1/3」, 「10-1/3」, 「10 - 1/3」)。
  4. 分數本身在數字和斜槓之間可能有也可能沒有空格(例如:「1/3」,「1/3」,「1/3」)。
  5. 可能有其他的文本需要的部分被忽略

後,我需要一個正則表達式,可以解析這些元素,這樣我可以擺脫這種混亂的創建合適的數量。

+0

我已經有了一個解決方案正則表達式,它的作品真的很好,所以我打算與分享所以希望它能夠拯救別人很多工作。 – 2008-10-29 00:14:14

+0

它使用哪種語言和/或正則表達式引擎? – 2008-10-30 04:21:33

回答

10

這裏有一個正則表達式將處理所有的數據,我可以扔了:

(\d++(?! */))? *-? *(?:(\d+) */ *(\d+))?.*$ 

這將會把數字分爲以下幾組:

  1. 混合數字的整數部分如果它存在
  2. 的分子,如果一小部分退出
  3. 分母,如果一小部分存在

而且,這裏的元素使用RegexBuddy說明(施工時它這讓我非常):

Match the regular expression below and capture its match into backreference number 1 «(\d++(?! */))?» 
    Between zero and one times, as many times as possible, giving back as needed (greedy) «?» 
    Match a single digit 0..9 «\d++» 
     Between one and unlimited times, as many times as possible, without giving back (possessive) «++» 
    Assert that it is impossible to match the regex below starting at this position (negative lookahead) «(?! */)» 
     Match the character 「 」 literally « *» 
     Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*» 
     Match the character 「/」 literally «/» 
Match the character 「 」 literally « *» 
    Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*» 
Match the character 「-」 literally «-?» 
    Between zero and one times, as many times as possible, giving back as needed (greedy) «?» 
Match the character 「 」 literally « *» 
    Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*» 
Match the regular expression below «(?:(\d+) */ *(\d+))?» 
    Between zero and one times, as many times as possible, giving back as needed (greedy) «?» 
    Match the regular expression below and capture its match into backreference number 2 «(\d+)» 
     Match a single digit 0..9 «\d+» 
     Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+» 
    Match the character 「 」 literally « *» 
     Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*» 
    Match the character 「/」 literally «/» 
    Match the character 「 」 literally « *» 
     Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*» 
    Match the regular expression below and capture its match into backreference number 3 «(\d+)» 
     Match a single digit 0..9 «\d+» 
     Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+» 
Match any single character that is not a line break character «.*» 
    Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*» 
Assert position at the end of the string (or before the line break at the end of the string, if any) «$» 
2

我認爲它可能是更容易應對不同的情況(只有充分混合,分數,只有數)彼此分開。例如:

sub parse_mixed { 
    my($mixed) = @_; 

    if($mixed =~ /^ *(\d+)[- ]+(\d+) *\/ *(\d)+(\D.*)?$/) { 
    return $1+$2/$3; 
    } elsif($mixed =~ /^ *(\d+) *\/ *(\d+)(\D.*)?$/) { 
    return $1/$2; 
    } elsif($mixed =~ /^ *(\d+)(\D.*)?$/) { 
    return $1; 
    } 
} 

print parse_mixed("10"), "\n"; 
print parse_mixed("1/3"), "\n"; 
print parse_mixed("1/3"), "\n"; 
print parse_mixed("10 1/3"), "\n"; 
print parse_mixed("10-1/3"), "\n"; 
print parse_mixed("10 - 1/3"), "\n"; 
1

如果您使用的是Perl 5.10,這就是我如何寫它。

 
m{ 
^
    \s*  # skip leading spaces 

    (?'whole' 
    \d++ 
    (?! \s*[\/]) # there should not be a slash immediately following a whole number 
) 

    \s* 

    (?: # the rest should fail or succeed as a group 

    -?  # ignore possible neg sign 
    \s* 

    (?'numerator' 
    \d+ 
    ) 

    \s* 
    [\/] 
    \s* 

    (?'denominator' 
    \d+ 
    ) 
)? 
}x 

然後你可以從%+變量訪問值這樣的:

$+{whole}; 
$+{numerator}; 
$+{denominator};