如何僅使用PERL，正則表達式獲取文件名（不是完整路徑）到$ 1

我只想保留文件名（不是完整路徑）並將文件名添加到某些bbcode。

這裏是要轉換的HTML：

<a href=/path/to/full/image.jpg rel=prettyPhoto><img rel=prettyPhoto src=/path/to/thumb/image.jpg /></a>

通知我不能有相對=「富」（沒有雙引號）..

這是我在PERL，執行轉換：

s/\<a href=(.+?)\ rel=prettyPhoto\>\<img rel=prettyPhoto src=(.+?) \/>\<\/a\>/\[box\]$1\[\/box\]/gi;

這將HTML轉換爲：

[box]/path/to/image.jpg[/box]

但是，這是我想要的結果：

[box]image.jpg[/box]

的HTML必須保持不變。那麼，如何更改我的PERL，使$ 1只包含文件名？

來源

2011-03-03 Scott

s/\<a href=(?:.*\/)?(.+?)\ rel=prettyPhoto\>\<img rel=prettyPhoto src=(.+?) \/>\<\/a\>/\[box\]$1\[\/box\]/gi;

(?:.*\/)?

將由/匹配最長的部分整理。最後的?使這個可選。

來源

2011-03-03 15:29:04 Goug

這個工作最適合我，我首選的解決方案 - 我不得不做這一切的正則表達式。非常感謝，我在這裏約2小時！在那一點上...... – Scott 2011-03-03 16:05:32

不要捕捉整個事情。使用(?:...)的非捕獲組。這樣，您可以進一步細分您匹配的部分和您捕獲的部分。

來源

2011-03-03 15:31:15 0xC0000022L

這顯然不適用於正則表達式，但您可以運行$ 1上的split函數並獲取結果數組的最後一個元素。

來源

2011-03-03 15:33:07

什麼：

s/\<a href=.*\/(.+?)\ rel=prettyPhoto\>\<img rel=prettyPhoto src=(.+?) \/>\<\/a\>/\[box\]$1\[\/box\]/gsi;

來源

2011-03-03 15:36:46 Kevin

我不知道，如果它處理邊緣的情況下，但我得到了這個工作：

#!/usr/bin/perl 

use strict; 
use warnings; 

my $in = '<a href=/path/to/full/image.jpg rel=prettyPhoto><img rel=prettyPhoto src=/path/to/thumb/image.jpg /></a>'; 

$in =~ s/\<a href=.*?([^\/]+)\ rel=prettyPhoto\>\<img rel=prettyPhoto src=(.+?) \/>\<\/a\>/\[box\]$1\[\/box\]/gi; 

print $in . "\n";

但是，你會不會寧願做這樣的事情：

#!/usr/bin/perl 

use strict; 
use warnings; 

use HTML::TokeParser; 
my $p = HTML::TokeParser->new(\*DATA); 

my $token = $p->get_tag("a"); 
my $token_attribs = $token->[1]; 
my $bb_code; 

if ($token_attribs->{rel} eq 'prettyPhoto') { 

    my $url = $token_attribs->{href}; 
    my @split_path = split(m'/', $url); 

    $bb_code = '[box]' . $split_path[-1] . '[/box]'; 
} 

print $bb_code . "\n"; 
__DATA__ 
<a href=/path/to/full/image.jpg rel=prettyPhoto><img rel=prettyPhoto src=/path/to/thumb/image.jpg /></a>

使用HTML解析器（如HTML::TokeParser，它有文檔中的示例）來查找url爲你？比依靠手工重新編排HTML好得多。

來源

2011-03-03 15:38:51

我建議你使用正確的工具來做，像這樣的：

use HTML::PullParser; 
use URI; 

die '' . $! || [email protected] 
    unless my $p = HTML::PullParser->new(
     doc   => $doc_handle 
    , start  => 'tag, attr' 
    , report_tags => ['a'] 
    ); 

my @file_names; 
while (my $t = $p->get_token) { 
    next unless $t and my ($tag_name, $attr) = @$t; 
    next unless $attr and my $href = $attr->{href}; 
    next unless my $uri = URI->new($attr->{href}); 
    next unless my $path = $uri->path; 
    push @file_names, substr($path, rindex($path, '/') + 1); 
    # or it's safe to use a regex here: 
    # push @file_names, $path =~ m{([^/]+)$}; 
} 

Data::Dumper->Dump([ \@file_names ], [ '*file_names' ]);

Friends don't let friends parse HTML with regexes.

來源

2011-03-03 16:26:21 Axeman

如何僅使用PERL，正則表達式獲取文件名（不是完整路徑）到$ 1

回答

相關問題