2012-08-01 106 views
0

我有一些單詞需要按頻率排序。在我這樣做之前,我需要刪除諸如「the」,「it」等(任何少於三個字母的單詞),以及所有數字和以#開頭的任何單詞(這些單詞從Twitter,儘管下面的例子只是維基百科的一段隨機段落)。從一個數組中刪除幾個單詞 - Javascript

我可以刪除一個單詞,但一直在瘋狂嘗試刪除多個或一個範圍。有什麼建議麼?謝謝!

http://jsfiddle.net/9NzAC/6/

HTML:

<div id="text" style="background-color:Teal;position:absolute;left:100px;top:10px;height:500px;width:500px;"> 
Phrenology is a pseudoscience primarily focused on measurements of the human skull, based on the concept that the brain is the organ of the mind, and that certain brain areas have localized, specific functions or modules. The distinguishing feature of phrenology is the idea that the sizes of brain areas were meaningful and could be inferred by examining the skull of an individual. 
</div> 

JS:

//this is the function to remove words 
<script type="text/javascript"> 
    function removeA(arr){ 
     var what, a= arguments, L= a.length, ax; 
     while(L> 1 && arr.length){ 
      what= a[--L]; 
      while((ax= arr.indexOf(what))!= -1){ 
       arr.splice(ax, 1); 
      } 
     } 
      return arr; 
     } 
</script> 

//and this does the sorting & counting 
<script type="text/javascript"> 
    var getMostFrequentWords = function(words) { 
     var freq={}, freqArr=[], i; 

     // Map each word to its frequency in "freq". 
      for (i=0; i<words.length; i++) { 
      freq[words[i]] = (freq[words[i]]||0) + 1; 
     } 

     // Sort from most to least frequent. 
      for (i in freq) freqArr.push([i, freq[i]]); 
      return freqArr.sort(function(a,b) { return b[1] - a[1]; }); 
     }; 

     var words = $('#text').get(0).innerText.split(/\s+/); 

     //Remove articles & words we don't care about. 
     var badWords = "the"; 
      removeA(words,badWords); 
     var mostUsed = getMostFrequentWords(words); 
     alert(words); 

</script> 
+0

我建議你做'數組[我] = null'(或' 「」'),然後就收拾你的陣列空節點。您可以使用'Array#filter'輕鬆實現該功能。 – 2012-08-01 03:24:30

+1

如果您遇到任何問題,請查看此幫助。 http://jsfiddle.net/n2jj4/1/ – 2012-08-01 05:00:26

+0

這是一段非常有用且全面的代碼。非常感謝。這非常有幫助。 – user1307028 2012-08-01 05:48:33

回答

2

而不是從原始數組中刪除,只是push到一個新的,它更簡單,它會使您的代碼更短,更具可讀性。

var words = ['the', 'it', '12', '#twit', 'aloha', 'hello', 'bye'] 
var filteredWords = [] 

for (var i = 0, l = words.length, w; i < l; i++) { 
    w = words[i] 
    if (!/^(#|\d+)/.test(w) && w.length > 3) 
     filteredWords.push(w) 
} 

console.log(filteredWords) // ['aloha', 'hello'] 

演示:http://jsfiddle.net/VcfvU/

+0

哇。就是這樣。非常感謝,非常感謝。 – user1307028 2012-08-01 03:46:50

+0

極力不建議隱藏括號,並且還建議將分號D: – 2012-08-01 16:00:14

1

我建議你做array[i] = null(或""),然後就收拾你的陣列空節點。您可以輕鬆實現,使用Array#filter

測試:http://jsfiddle.net/6LPep/ 代碼:

var FORGETABLE_WORDS = ',the,of,an,and,that,which,is,was,'; 

var words = text.innerText.split(" "); 

for(var i = 0, word; word = words[i++];) { 
    if (FORGETABLE_WORDS.indexOf(',' + word + ',') > -1 || word.length < 3) { 
     words[i-1] = ""; 
    } 
} 

// falsy will get deleted 
words.filter(function(e){return e}); 
// as example 
output.innerHTML = words.join(" "); 

// just continue doing your stuff with "words" array. 
// ...​ 

我認爲這是比你目前做的方式清潔。如果你需要其他的東西,我會更新這個答案。

+1

非常感謝您對此的幫助!學習了一種新技術 - 謝謝! – user1307028 2012-08-01 03:48:53