刪除單詞，只有3個字符或更少用PHP

我使用php類從文章製作標籤雲，但我想刪除只有3個字符或更少的單詞，也刪除數字單詞。刪除單詞，只有3個字符或更少用PHP

例如標籤：1111猴鹿貓豬水牛

我想要的結果是：猴鹿水牛從該類

PHP代碼（完整的代碼here）

function keywords_extract($text) 
{ 
    $text = strtolower($text); 
    $text = strip_tags($text); 

    /* 
    * Handle common words first because they have punctuation and we need to remove them 
    * before removing punctuation. 
    */ 
    $commonWords = "'tis,'twas,a,able,about,across,after,ain't,all,almost,also,am,among,an,and,any,are,aren't," . 
     "as,at,be,because,been,but,by,can,can't,cannot,could,could've,couldn't,dear,did,didn't,do,does,doesn't," . 
     "don't,either,else,ever,every,for,from,get,got,had,has,hasn't,have,he,he'd,he'll,he's,her,hers,him,his," . 
     "how,how'd,how'll,how's,however,i,i'd,i'll,i'm,i've,if,in,into,is,isn't,it,it's,its,just,least,let,like," . 
     "likely,may,me,might,might've,mightn't,most,must,must've,mustn't,my,neither,no,nor,not,o'clock,of,off," . 
     "often,on,only,or,other,our,own,rather,said,say,says,shan't,she,she'd,she'll,she's,should,should've," . 
     "shouldn't,since,so,some,than,that,that'll,that's,the,their,them,then,there,there's,these,they,they'd," . 
     "they'll,they're,they've,this,tis,to,too,twas,us,wants,was,wasn't,we,we'd,we'll,we're,were,weren't,what," . 
     "what'd,what's,when,when,when'd,when'll,when's,where,where'd,where'll,where's,which,while,who,who'd," . 
     "who'll,who's,whom,why,why'd,why'll,why's,will,with,won't,would,would've,wouldn't,yet,you,you'd,you'll," . 

    $commonWords = strtolower($commonWords); 
    $commonWords = explode(",", $commonWords); 
    foreach($commonWords as $commonWord) 
    { 
     $text = $this->str_replace_word($commonWord, "", $text); 
    } 

    /* remove punctuation and newlines */ 
    /* 
    * Changed to handle international characters 
    */ 
    if ($this->m_bUTF8) 
     $text = preg_replace('/[^\p{L}0-9\s]|\n|\r/u',' ',$text); 
    else 
     $text = preg_replace('/[^a-zA-Z0-9\s]|\n|\r/',' ',$text); 

    /* remove extra spaces created */ 
    $text = preg_replace('/ +/',' ',$text); 
    $text = trim($text); 
    $words = explode(" ", $text); 
    foreach ($words as $value) 
    { 
     $temp = trim($value); 
     if (is_numeric($temp)) 
      continue; 
     $keywords[] = trim($temp); 
    } 
    return $keywords; 
}

我已經試過各種方式，如使用if (strlen($words)<3 && is_numeric($words)==true)，但它沒有奏效。

請幫我

來源

2012-07-08 Masykur KonHollow

。 ..'is_numeric（$ words）== true'）是不可靠的。它應該是'if（strlen（$ words）<3 && is_numeric（$ words））'。更準確地說，你應該首先執行數字檢查，如果你想這樣檢查if（is_numeric（$ words）&& strlen（$ words）<3）'。 – Lion 2012-07-08 04:21:05

@Lion：但即使是前者也應該有效。 [The Manual]（http://php.net/manual/en/function.is-numeric.php）表示它只返回true或false。 – Shubham 2012-07-08 04:28:07

我會稍微修改您的進程以使其運行速度更快（我相信它應該）

第一步：我不會將每個常用詞替換爲$text中的空字符串（替換過程很昂貴），我會將每個常用詞存儲在哈希表中以供以後過濾。

$commonWords = explode(",", $commonWords); 
foreach($commonWords as $commonWord) 
    $hashWord[$commonWord] = $commonWord;

步驟2：濾波器公共字，數字和含有少於4位數字在同一時間的話。

$words = preg_split("/[\s\n\r]/", $text); 
foreach ($words as $value) 
{ 
    // Skip it is common word 
    if (isset($hashWord[$value])) continue; 
    // Skip if it is numeric 
    if (is_numeric($value)) continue; 
    // Skip if word contains less than 4 digits 
    if (strlen($value) < 4) continue; 

    $keywords[] = preg_replace('/[^a-zA-Z0-9\s].+/', '', $value); 
}

以下是該功能（要複製的情況下，和粘貼）一個完整的源代碼

function keywords_extract($text) { 
    $text = strtolower($text); 
    $text = strip_tags($text); 

    $commonWords = "'tis,'twas,a,able,about,across,after,ain't,all,almost,also,am,among,an,and,any,are,aren't," . 
     "as,at,be,because,been,but,by,can,can't,cannot,could,could've,couldn't,dear,did,didn't,do,does,doesn't," . 
     "don't,either,else,ever,every,for,from,get,got,had,has,hasn't,have,he,he'd,he'll,he's,her,hers,him,his," . 
     "how,how'd,how'll,how's,however,i,i'd,i'll,i'm,i've,if,in,into,is,isn't,it,it's,its,just,least,let,like," . 
     "likely,may,me,might,might've,mightn't,most,must,must've,mustn't,my,neither,no,nor,not,o'clock,of,off," . 
     "often,on,only,or,other,our,own,rather,said,say,says,shan't,she,she'd,she'll,she's,should,should've," . 
     "shouldn't,since,so,some,than,that,that'll,that's,the,their,them,then,there,there's,these,they,they'd," . 
     "they'll,they're,they've,this,tis,to,too,twas,us,wants,was,wasn't,we,we'd,we'll,we're,were,weren't,what," . 
     "what'd,what's,when,when,when'd,when'll,when's,where,where'd,where'll,where's,which,while,who,who'd," . 
     "who'll,who's,whom,why,why'd,why'll,why's,will,with,won't,would,would've,wouldn't,yet,you,you'd,you'll,"; 

    $commonWords = explode(",", $commonWords); 
    foreach($commonWords as $commonWord) 
     $hashWord[$commonWord] = $commonWord; 

    $words = preg_split("/[\s\n\r]/", $text); 
    foreach ($words as $value) 
    { 
     // Skip it is common word 
     if (isset($hashWord[$value])) continue; 
     // Skip if it is numeric 
     if (is_numeric($value)) continue; 
     // Skip if word contains less than 4 digits 
     if (strlen($value) < 4) continue; 

     $keywords[] = preg_replace('/[^a-zA-Z0-9\s].+/', '', $value); 
    } 
    return $keywords; 
}

演示：ideone.com/obG6n

來源

2012-07-08 05:17:01 invisal

感謝您的幫助 – 2012-07-08 06:37:14

當我運行我的頁面時，它變成空白（錯誤），但是當我在$ $ hashWord [$ commonWord] = $ commonWord之間添加{和}時，它顯示錯誤'警告：preg_replace（）...'和'警告：爲foreach（）提供的無效參數' – 2012-07-08 07:00:50

它可以正常使用我的電腦。檢查這個http://ideone.com/obG6n – invisal 2012-07-08 07:28:44

If((strlen($word) <= 3) && is_numeric($words)){ 
    //Don't add in the list 
}

來源

2012-07-08 04:22:25 Shubham

，則應該更換&&到||：
來自：
if (strlen($words)<3 && is_numeric($words)==true)
到：
if (strlen($words)<3 || is_numeric($words)==true)

，如果你想刪除有話 3個字符或更少，
那麼你應該使用<=而不是<：

/* remove extra spaces created */ 
$text = preg_replace('/ +/',' ',$text); 
$text = trim($text); 
$words = explode(" ", $text);

到：
if (strlen($words) <= 3 || is_numeric($words)==true)

來源

2012-07-08 04:37:09 alfasin

應該在哪裏更改？ – Lion 2012-07-08 04:39:11

@Lion更新了我的答案 – alfasin 2012-07-08 04:42:34

如何？它必須是'&&'而不是'''。變量'$ words'必須是數字**以及**其長度必須小於或等於3.（**不是**他們中的任何一個，但是他們同時應該被滿足）。 – Lion 2012-07-08 04:50:38

你可以用正則表達式

變化做

/* remove extra spaces created */ 
$words = preg_replace('/\b\w{1,3}\s|[0-9]/gi','',$text); 
return $words;

並刪除下面的foreach部分包括返回;

這裏是正則表達式的解釋：

\b = Match a word boundary position (whitespace or the beginning/end of the string). 
\w = Match any word character (alphanumeric & underscore). 
{1,3} = Matches 1 to 3 of the preceeding token. 
\s = Match any whitespace character (spaces, tabs, line breaks). 
| = or. 
[0-9] = Match any numeric character.

這裏是這種模式的人可以理解的解釋：「查找從起始位置的長度--has任何單詞字符一個字1或3個字符和一個以下空格 - 或 - 數字字符 - 並將其替換爲空字符串。

來源

2012-07-08 04:54:07 htbasaran

感謝您的幫助 – 2012-07-08 06:37:34

現在我使用'$ text = preg_replace（'！\\ b \\ w {1,3} \\ b！'，''，$ text）; '它對我有用:) – 2012-07-08 14:55:56

現在我加$text = preg_replace('!\\b\\w{1,3}\\b!', ' ', $text);

前

$text = preg_replace('/ +/',' ',$text); 
    $text = trim($text); 
    $words = explode(" ", $text);

，如果你想使用這個PHP類沒有錯誤:)

source

，你可以得到代碼here

感謝所有:)

來源

2012-07-08 15:15:41

刪除單詞，只有3個字符或更少用PHP

回答

相關問題