2011-09-23 38 views
6

我正在尋找一個足夠聰明的javascript函數來刪除長文本塊(實際上是一段)的最後一句。一些示例文本顯示的複雜性:Javascript(jQuery)刪除長文本的最後一句

<p>Blabla, some more text here. Sometimes <span>basic</span> html code is used but that should not make the "selection" of the sentence any harder! I looked up the window and I saw a plane flying over. I asked the first thing that came to mind: "What is it doing up there?" She did not know, "I think we should move past the fence!", she quickly said. He later described it as: "Something insane."</p> 

現在我可能分裂的.和刪除數組的最後一個條目但不會與?!結尾句工作,有些句子引號結束!something: "stuff."

function removeLastSentence(text) { 
    sWithoutLastSentence = ...; // ?? 
    return sWithoutLastSentence; 
} 

如何做到這一點?什麼是適當的算法?

編輯 - 長文本我的意思是我的段落和一句我的意思是一個實際的句子(非行),所以在我的例子中的最後一句話就是所有的內容:He later described it as: "Something insane."當一個被刪除,下一個是She did not know, "I think we should move past the fence!", she quickly said."

+0

定義「最後一句」和「長字符串」。如果您正在尋找限制文本中行數的方法,請參閱** [此答案](http://stackoverflow.com/questions/7519337/given-a-textarea-is-there-a-方式對限制長度基於上的線/ 7521855#7521855)**。 –

+0

編輯我的問題,通過句子我的意思是一個真正的句子,見上文。 :) – bartolsthoorn

+0

***他後來形容爲:「瘋狂的東西。」***我不是英語專業..但這是正確的嗎?或者它應該是***他後來形容它是「瘋狂的東西」。*** – rlemon

回答

2

定義您的規則:[!?] // 1.句子以大寫字母 // 2.句子是由什麼或之前開始的,但不是[,:;] // 3.如果格式不正確,則可以在引號之前加上引號,例如[「'] // 4.如果引用之後的單詞是名稱

任何其他規則

定義你的目的: // 1.刪除最後一句

假設: 如果從最後一個字符的文本字符串的開始和向後工作,那麼你會找出句子的開頭: 1。字符前面的文本字符串是[。?!]或 2.字符前面的文本字符串是[「'],前面是大寫字母 3.每個[。]前面都有一個空格 4。我們不糾正html標籤 5.這些假設不健全,需要定期修改

可能的解決方案: 讀入您的字符串並將其拆分到空格字符上,以給我們大量字符串進行反向查看。

var characterGroups = $('#this-paragraph').html().split(' ').reverse(); 

如果你的字符串是:

BLABLA,這裏一些文字。有時使用基本的html代碼,但不應該使句子的「選擇」更難!我擡頭看窗戶,我看到一架飛機飛過。我問了第一個想到的事情:「那裏有什麼?」她不知道,「我認爲我們應該越過圍欄!」,她很快說。他後來形容爲:「瘋狂的東西。「

var originalString = 'Blabla, some more text here. Sometimes <span>basic</span> html code is used but that should not make the "selection" of the sentence any harder! I looked up the window and I saw a plane flying over. I asked the first thing that came to mind: "What is it doing up there?" She did not know, "I think we should move past the fence!", she quickly said. He later described it as: "Something insane."'; 

然後你在characterGroups陣列將是:

["insane."", ""Something", "as:", "it", "described", "later", "He", 
"said.", "quickly", "she", "fence!",", "the", "past", "move", "should", "we", 
"think", ""I", "know,", "not", "did", "She", "there?"", "up", "doing", "it", 
"is", ""What", "mind:", "to", "came", "that", "thing", "first", "the", "asked", 
"I", "over.", "flying", "plane", "a", "saw", "I", "and", "window", "the", "up", 
"looked", "I", "harder!", "any", "sentence", "the", "of", ""selection"", "the", 
"make", "not", "should", "that", "but", "used", "is", "code", "html", "basic", 
"Sometimes", "here.", "text", "more", "some", "Blabla,"] 

注:的'標籤和其他人使用的.text()方法來去除jQuery中

每個塊後跟一個空格,所以當我們確定了我們的句子開始位置(通過數組索引)時,我們將知道該空間有什麼索引,並且我們可以將原始字符串拆分爲l這個空間佔據了句子結尾的那個索引。

給自己一個變量來標記,如果我們發現與否和一個變量來保存的數組元素的索引位置,我們確定爲保持最後一句的開頭:

var found = false; 
var index = null; 

遍歷數組,並尋找任何元素結束[。!?]或「前一個元素開始以大寫字母在那裏結束。

var position  = 1,//skip the first one since we know that's the end anyway 
    elements  = characterGroups.length, 
    element  = null, 
    prevHadUpper = false, 
    last   = null; 

while(!found && position < elements) { 
    element = characterGroups[position].split(''); 

    if(element.length > 0) { 
     last = element[element.length-1]; 

     // test last character rule 
     if(
      last=='.'      // ends in '.' 
      || last=='!'     // ends in '!' 
      || last=='?'     // ends in '?' 
      || (last=='"' && prevHadUpper) // ends in '"' and previous started [A-Z] 
     ) { 
      found = true; 
      index = position-1; 
      lookFor = last+' '+characterGroups[position-1]; 
     } else { 
      if(element[0] == element[0].toUpperCase()) { 
      prevHadUpper = true; 
      } else { 
      prevHadUpper = false; 
      } 
     } 
    } else { 
     prevHadUpper = false; 
    } 
    position++; 
} 

如果你運行上面的腳本會正確識別‘他’爲最後一句的開頭

0123再次

var trimPosition = originalString.lastIndexOf(lookFor)+1; 
var updatedString = originalString.substr(0,trimPosition); 
console.log(updatedString); 

// Blabla, some more text here. Sometimes <span>basic</span> html code is used but that should not make the "selection" of the sentence any harder! I looked up the window and I saw a plane flying over. I asked the first thing that came to mind: "What is it doing up there?" She did not know, "I think we should move past the fence!", she quickly said. 

運行,並得到: BLABLA,這裏一些文字現在

console.log(characterGroups[index]); // He at index=6 

,你可以通過你收到的字符串運行。有時使用基本的html代碼,但不應該使句子的「選擇」更難!我擡頭看窗戶,我看到一架飛機飛過。我問了第一個想到的事情:「那裏有什麼?」

再次運行它並得到: Blabla,一些更多的文字在這裏。有時使用基本的html代碼,但不應該使句子的「選擇」更難!我擡頭看窗戶,我看到一架飛機飛過。

再次運行它並得到: Blabla,一些更多的文字在這裏。有時使用基本的html代碼,但不應該使句子的「選擇」更難!

再次運行它並得到: Blabla,一些更多的文字在這裏。

再次運行它並得到: Blabla,一些更多的文字在這裏。

所以,我認爲這符合你在找什麼?

作爲一個功能:

function trimSentence(string){ 
    var found = false; 
    var index = null; 

    var characterGroups = string.split(' ').reverse(); 

    var position  = 1,//skip the first one since we know that's the end anyway 
     elements  = characterGroups.length, 
     element  = null, 
     prevHadUpper = false, 
     last   = null, 
     lookFor  = ''; 

    while(!found && position < elements) { 
     element = characterGroups[position].split(''); 

     if(element.length > 0) { 
      last = element[element.length-1]; 

      // test last character rule 
      if(
       last=='.' ||    // ends in '.' 
       last=='!' ||    // ends in '!' 
       last=='?' ||    // ends in '?' 
       (last=='"' && prevHadUpper) // ends in '"' and previous started [A-Z] 
      ) { 
       found = true; 
       index = position-1; 
       lookFor = last+' '+characterGroups[position-1]; 
      } else { 
       if(element[0] == element[0].toUpperCase()) { 
       prevHadUpper = true; 
       } else { 
       prevHadUpper = false; 
       } 
      } 
     } else { 
      prevHadUpper = false; 
     } 
     position++; 
    } 


    var trimPosition = string.lastIndexOf(lookFor)+1; 
    return string.substr(0,trimPosition); 
} 

是微不足道的做一個插件,如果,但要小心的假設! :)

這有幫助嗎?

感謝, AE

0

這是一個很好的。你爲什麼不創建一個臨時變量,將所有'!'和'?'到'。'中,分開那個臨時變量,刪除最後一個句子,把這個臨時數組合併成一個字符串,並把它的長度?然後串原來的段落,直到該長度

+0

或者嘿,只是使用正則表達式,它更容易= P – EHorodyski

+0

實際上,通過在一個句子結尾處替換'。「',我可能只用'/[\.!?]/',即@omnosis的正則表達式 – bartolsthoorn

+0

您仍然會遇到包含含有結尾標點符號的句子的問題,如您的示例中所示。 – samiz

1

這應該做到這一點。

/* 
Assumptions: 
- Sentence separators are a combination of terminators (.!?) + doublequote (optional) + spaces + capital letter. 
- I haven't preserved tags if it gets down to removing the last sentence. 
*/ 
function removeLastSentence(text) { 

    lastSeparator = Math.max(
     text.lastIndexOf("."), 
     text.lastIndexOf("!"), 
     text.lastIndexOf("?") 
    ); 

    revtext = text.split('').reverse().join(''); 
    sep = revtext.search(/[A-Z]\s+(\")?[\.\!\?]/); 
    lastTag = text.length-revtext.search(/\/\</) - 2; 

    lastPtr = (lastTag > lastSeparator) ? lastTag : text.length; 

    if (sep > -1) { 
     text1 = revtext.substring(sep+1, revtext.length).trim().split('').reverse().join(''); 
     text2 = text.substring(lastPtr, text.length).replace(/['"]/g,'').trim(); 

     sWithoutLastSentence = text1 + text2; 
    } else { 
     sWithoutLastSentence = ''; 
    } 
    return sWithoutLastSentence; 
} 

/* 
TESTS: 

var text = '<p>Blabla, some more text here. Sometimes <span>basic</span> html code is used but that should not make the "selection" of the text any harder! I looked up the window and I saw a plane flying over. I asked the first thing that came to mind: "What is it doing up there?" She did not know, "I think we should move past the fence!", she quickly said. He later described it as: "Something insane. "</p>'; 

alert(text + '\n\n' + removeLastSentence(text)); 
alert(text + '\n\n' + removeLastSentence(removeLastSentence(text))); 
alert(text + '\n\n' + removeLastSentence(removeLastSentence(removeLastSentence(text)))); 
alert(text + '\n\n' + removeLastSentence(removeLastSentence(removeLastSentence(removeLastSentence(text))))); 
alert(text + '\n\n' + removeLastSentence(removeLastSentence(removeLastSentence(removeLastSentence(removeLastSentence(text)))))); 
alert(text + '\n\n' + removeLastSentence(removeLastSentence(removeLastSentence(removeLastSentence(removeLastSentence(removeLastSentence(text))))))); 
alert(text + '\n\n' + removeLastSentence('<p>Blabla, some more text here. Sometimes <span>basic</span> html code is used but that should not make the "selection" of the text any harder! I looked up the ')); 
*/ 
+0

謝謝你的代碼! – bartolsthoorn

+0

我已經在coffeescript https://gist.github.com/1270335中重寫了你的條目 – bartolsthoorn