2016-07-08 551 views
0

我正在處理一堆引用圖像文件名的文本文件。這些文件名被消毒(使小寫字母和空格替換爲連字符) - 但引用它們的文本不是。正則表達式:匹配指定字符串之間的所有匹配項

我需要轉換的字符串是這樣的:

(image: uploaded IMAGE.jpg caption: this is my caption) 
(image: uploaded IMAGE copy.jpeg caption: this is my caption) 
(image: IMG_6087.png caption: this is my caption) 
(image: IMG_6087 copy.gif) 
(image: IMG_9999_copy.jpg) 
(image: somehow, a comma.jpg) 
(image: other ridic'ulous characters!.jpg) 

到:

(image: uploaded-image.jpg caption: this is my caption) 
(image: uploaded-image-copy.jpeg caption: this is my caption) 
(image: img_6087.png caption: this is my caption) 
(image: img_6087-copy.gif) 
(image: img_9999_copy.jpg) 
(image: somehow-a-comma.jpg) 
(image: other-ridiculous-characters.jpg) 

這些字符串較大的文本塊的部分,但都是在他們自己的線路,像這樣:

This is not a short guide to write about art. Go in, out of the window, inside New York’s stars qualities, dreams and schemes. People are gathered together, brewing coffee — you have seen their faces? The artists in Manhattan. 

(image: manhattan photo.jpg) 

Drive till sunset and say goodbye to your body, because this is not a photograph. I saw sixteen americans, raised by wolves, probably lost in paradise city. I found your head — Do you still want it? 

我正在使用Sublime文本,並計劃做多個替換所有:

  1. 帶空格
  2. 條字符不是字母數字或_或 -
  3. 變爲小寫

但我不能設法捕捉的兩個分隔符之間的東西所有實例。

(?<=^\(image:)[what do I do here??](?=\.jpe?g|png|gif)

回答

0

可以使用非貪婪匹配所有.*?

所以^\(image: (.*?\.(:?jpe?g|png|gif))捕捉到的文件名,包括擴展名

+0

這會捕獲整個文件名,所以我可以使用搜索將其設置爲小寫:'(?<=^\(image:)(。*?)(?= \。jpe?g | png | gif)'替換:'\ L $ 1' - 這樣可以解決第3步 - 但是如何找到並用連字符替換所有空格? –

-1

你可以嘗試Jetbrains的webstrom前端IDE。它提供了很多能夠以可讀方式實現任何正則表達式操作的功能。選擇你想要分割的文本是檢查分隔符或任何空格。

您將獲得30天的足跡版本。也將很快分享你的正則表達式查詢。

也會檢出http://myregexp.com/或某些插件有效的正則表達式查詢

Online Regex editor

0

你可以抓住的文件名用:

(?<=image:\s)([^.]++)(?=\.jpe?g|\.png|\.gif) 

之後,轉換取決於語言,你」重新工作。根據需要添加文件擴展名。現在您支持jpg,jpeg,pnggif

0

這裏是工作的方式做到這一點在PHP

<?php 
$string = 
"This is not a short guide to write about art. Go in, out of the window, inside New York’s stars qualities, dreams and schemes. People are gathered together, brewing coffee — you have seen their faces? The artists in Manhattan. 

(image: uploaded IMAGE.jpg caption: this is my caption) 
This is not a short guide to write about art. Go in, out of the window, inside New York’s stars qualities, dreams and schemes. People are gathered together, brewing coffee — you have seen their faces? The artists in Manhattan. 

(image: uploaded IMAGE copy.jpeg caption: this is my caption) 
(image: IMG_6087.png caption: this is my caption) 
(image: IMG_6087 copy.gif) blah blah 
(image: IMG_9999_copy.jpg) 
(image: somehow, a comma.jpg) 
(image: other ridic'ulous characters!.jpg)"; 

echo preg_replace_callback('~(?<=\(image:)(.*?)\.(jpg|jpeg|png|gif)~', function($matches) 
{ 
    return preg_replace('~\W~', '-', stripslashes(strtolower($matches[1]))) . ".$matches[2]"; 
}, $string); 

?> 

[編輯]加正則表達式的解釋:

  • (?<=image:):是一個積極的回顧後 - 因此檢查存在的形象:「但沒有捕獲。
  • (.*?):以貪婪的方式捕捉圖片擴展之前的所有內容 - 儘可能少地匹配文本。
  • \.(jpg|jpeg|png|gif):將匹配.字面+給定的擴展之一 - 捕獲擴展以重用。
  • ~:是分隔符,這種選擇只是因爲它是在字符串很少使用,不需要\/
  • \W:是的\w相反,它會匹配任何非字母數字字符。

將輸出(在查看源代碼):

This is not a short guide to write about art. Go in, out of the window, inside New York’s stars qualities, dreams and schemes. People are gathered together, brewing coffee — you have seen their faces? The artists in Manhattan. 

(image: uploaded-image.jpg caption: this is my caption) 
This is not a short guide to write about art. Go in, out of the window, inside New York’s stars qualities, dreams and schemes. People are gathered together, brewing coffee — you have seen their faces? The artists in Manhattan. 

(image: uploaded-image-copy.jpeg caption: this is my caption) 
(image: img_6087.png caption: this is my caption) 
(image: img_6087-copy.gif) blah blah 
(image: img_9999_copy.jpg) 
(image: somehow--a-comma.jpg) 
(image: other-ridic-ulous-characters-.jpg) 

然後,您可以微調在你想變成什麼什麼性格的回調,與str_replace()函數爲例。

希望它有幫助! ;)