2012-07-09 52 views
0

這是我的問題:我有一個大的字符串(近8000個字符),我想兩件事情:「」句與特定的大小和邊界檢測

  1. 檢測句子邊界像和
  2. 有不超過600個字符

我知道,在某些情況下,它不會是可能有兩個句子。在這種情況下,找到一個空格並分割句子。

該解決方案ridgerunner爲條件1號工作就像一個魅力,請參閱原始鏈接(http://goo.gl/PqI6d),但它往往輸出語句超過600個字符更大。任何光?提前致謝!

+0

檢查這個表達式是否是你想要什麼:'/(?:[^] {1,20}(? :| \。)| \ w {20,}(?: | \。)?)/'。你可以把'20'換成'600'來適應你的情況。測試案例:'這是一個簡短的句子。這是一個非常非常非常非常非常長很長很長很長很長的句子。 Andthisisaverylongwordwithoutspaces.' – nhahtdh 2012-07-09 05:49:26

回答

0

您可能會更好地匹配字符串。您對於比賽的正則表達式可能如下所示:

(.{0,600}?\.)|(.{0,600}(?=\))

總之,你先尋找儘可能小串的時期之前儘可能。如果沒有,則儘可能查找一個字符串,然後查找空格。然後下一場比賽將從你離開的地方起飛。

請注意,這是通用的正則表達式。你的PHP實現可能會有所不同

0

Tks nhahtdh。請看看我是否缺少一些東西。以下是我的字符串和使用您的建議輸出的摘錄。

<?php 
    $ptn = "/(?:[^.]{1,600}(?: |\.)|\w{600,}(?: |\.)?)/"; 
    $str = "Amblyopia occurs when the nerve pathway from one eye to the brain does not develop during childhood. This occurs because the abnormal eye sends a blurred image or the wrong image to the brain. This confuses the brain, and the brain may learn to ignore the image from the weaker eye. Strabismus is the most common cause of amblyopia. There is often a family history of this condition. The term "lazy eye" refers to amblyopia, which often occurs along with strabismus. However, amblyopia can occur without strabismus and people can have strabismus without amblyopia.First, any eye condition that is causing poor vision in the amblyopic eye (such as cataracts) needs to be corrected. Children with a refractive error (nearsightedness, farsightedness, or astigmatism) will need glasses. Next, a patch is placed on the normal eye. This forces the brain to recognize the image from the eye with amblyopia. Sometimes, drops are used to blur the vision of the normal eye instead of putting a patch on it. Children whose vision will not fully recover, and those with only good eye due to any disorder should wear glasses with protective polycarbonate lenses. Polycarbonate glasses are shatter- and scratch-resistant. Children who get treated before age 5 will usually recover almost completely normal vision, although they may continue to have problems with depth perception. Delaying treatment can result in permanent vision problems. After age 10, only a partial recovery of vision can be expected. Early recognition and treatment of the problem in children can help to prevent permanent visual loss. All children should have a complete eye examination at least once between ages 3 and 5. Special techniques are needed to measure visual acuity in a child who is too young to speak. Most eye care professionals can perform these techniques."; 
    preg_split($ptn, $str, -1, PREG_SPLIT_NO_EMPTY); 
    print_r($result); 
    ?> 

結果:我需要的句子從我的字符串小於600字符

Array 
(
[0] => childhood. 
[1] => brain. 
[2] => eye. 
[3] => amblyopia. 
[4] => condition. 
[5] => strabismus. 
[6] => amblyopia. 
[7] => corrected. 
[8] => glasses. 
[9] => eye. 
[10] => amblyopia. 
[11] => it. 
[12] => lenses. 
[13] => scratch-resistant. 
[14] => perception. 
[15] => problems. 
[16] => expected. 
[17] => loss. 
[18] => 5. 
[19] => speak. 
[20] => techniques 
)