我該如何明智地去掉Instagram標題中的所有尾隨標籤？

許多Instagram的帖子結束與主題標籤過多，例如：我該如何明智地去掉Instagram標題中的所有尾隨標籤？

"This is one of the amazing Mountains you can find in the National Forest Park in #Zhangjiajie #Chinawhich is where James Cameron drew his inspiration for the flying mountains in #Avatar.. 

Credit: @phototravelnomads  
#pictoura #gydr  
#destinationearth #earthpix #ourlonelyplanet#wonderful_earthLife #timeoutsociety#fantastic_earthpics #liveoutdoors #igglobalclub#awesomeearth #mist_vision #earthdeluxe 
# #worldbestgram #mthrworld #fantastic_earth#famouscaptures #destination_wow #dreamlifepix#wonderful_places #igworldclub #ig_global_life 
#natureaddict #beautifuldestinations #traveler #guider#locals"

我期待處理字幕在年底將包括hashtag收集，同時不影響剩餘。這將是一個很好的方法來做到這一點？我敢肯定，我可以找出一種蠻橫的方式，但我希望得到一些優雅的解決方案的想法。不必是實際的代碼。 :)每burna的評論

編輯：預期的結果將是：每艾倫·摩爾的回答

"This is one of the amazing Mountains you can find in the National Forest Park in #Zhangjiajie #Chinawhich is where James Cameron drew his inspiration for the flying mountains in #Avatar.. 

Credit: @phototravelnomads"

編輯：這工作得很好，但不是在所有情況。例如，如果輸入的文字是：

"This is one of the amazing Mountains you can find in the National Forest Park in #Zhangjiajie #Chinawhich is where James Cameron drew his inspiration for the flying mountains in #Avatar"

......它會從「＃張家界」中斷。

我想可能有更多的邏輯需要，也許將字符串拆分成數組;檢查它是否以hashtags結尾;如果是的話那麼多少;如果超過X（4？），則將其從最後一個完整系列中的第一個中刪除。

來源

2015-11-07 Walter Vos

您可以在處理後添加預期結果嗎？ – burna

歡迎來到SO！你的問題被自動格式化所打亂;你可以查看編輯歷史記錄，看看我做了什麼來修復它（當你在它的時候，檢查[幫助頁面]（http://stackoverflow.com/editing-help）看看還有什麼可用的）。你應該在這裏預覽*你的帖子以及校對他們。 ;） –

@burna我編輯了這個問題，但我認爲艾倫摩爾已經回答了:) –

如果我理解正確下面應該工作：

$hashTag="pictoura #gydr 

destinationearth #earthpix #ourlonelyplanet#wonderful_earthLife #timeoutsociety#fantastic_earthpics #liveoutdoors #igglobalclub#awesomeearth #mist_vision #earthdeluxe 

#worldbestgram #mthrworld #fantastic_earth#famouscaptures #destination_wow #dreamlifepix#wonderful_places #igworldclub #ig_global_life 

natureaddict #beautifuldestinations #traveler #guider#locals"; 

echo preg_replace('/(#.*\s*)/','',$hashTag);

輸出：

pictoura destinationearth natureaddict

祝你好運！

來源

2015-11-07 17:25:32

謝謝angelcool.net，但是它只是將它從第一個hashtag上切掉（然後停在換行符處）。請參閱：https：//regex101.com/r/jF1cI5/1 –

它看起來像這樣將做到這一點：

$result = preg_replace('/#[#\w\s]*\z/', '', $subject);

DEMO

正則表達式的哈希（#），其次是彌補主題標籤人物的零個或多個加上分隔空白匹配他們（[#\w\s]*），後面跟着字符串的結尾（\z）。

\w相當於[A-Za-z0-9_]。如果在標籤中允許使用其他字符，或者數字不允許，請告訴我，我將更新正則表達式。

UPDATE：如果你想同時保留合法者刪除所有ROBO-標籤，有可能沒有可靠的方法 - 當然不是單用正則表達式。然而，這將刪除所有，但第一線主題標籤：

$result = preg_replace('/^(#[#\w\h]+\R)#[#\w\s]*\z/m', '$1', $subject);

DEMO

\h比賽只有垂直空格（空格，製表，NBSP ...），以及\R匹配任何行分隔符（ \r\n或任何單個垂直空白字符）。

至於文本中的類似標籤的內容，這不會觸及它們，因爲它錨定在文本的末尾。起始行錨點（多行模式下的^）並不是真的必要，但它可能有助於正則表達式（包括您自己）的未來讀者理解它的作用。當然，評論將有助於更多。 ;）

來源

2015-11-07 18:48:58

嗨艾倫，這工作得很好，但它仍然有點粗糙。如果文本不包含「大量標籤」，但會以一個或兩個標籤結束，它們也會被剝離。問題在於人們也使用井號標籤作爲單詞（呃，人）。例如，如果輸入的內容是：「這是你可以在＃張家界＃中國國家森林公園找到的驚人山脈之一，這是詹姆斯卡梅隆在#Avatar中爲他的飛行山脈提供靈感的地方」 –

要添加：Regex也許不是要走的路嗎？ –

正則表達式絕對不是*最好的*解決方案;它永遠不會。但我認爲這項工作足夠好。檢查我的編輯。 –

我該如何明智地去掉Instagram標題中的所有尾隨標籤？

回答

相關問題