2016-12-04 179 views
0

我目前正試圖用英文字母映射devnagari腳本。但偶爾偶爾遇到列表索引超出範圍。我不想錯過任何名單。這就是爲什麼我不想使用錯誤處理,除非有必要。你可以看看我的腳本,並幫助解釋爲什麼會出現這種錯誤? 在我的word文件中我找到了哪個單詞導致了錯誤,但是如果我從這個單詞中上下使用了兩個句子,那麼錯誤就不存在了。即我認爲錯誤發生在特定的字符串長度。轉換字母(devnagari轉換爲英文)時出現索引列表問題

clean=[] 
dafuq=[] 
clean_list = [] 
replacements = {'अ':'A','आ':'AA', 'इ':'I', 'ई':'II', 'उ':'U','ऊ':'UU', 'ए':'E', 'ऐ':'AI', 
       'ओ':'O','औ':'OU', 'क':'KA', 'ख':'KHA', 'ग':'GA', 'घ':'GHA', 'ङ':'NGA', 
       'च':'CA','छ':'CHHA', 'ज':'JA', 'झ':'JHA','ञ':'NIA', 'ट':'TA', 'ठ':'THA', 
       'ड':'DHA','ढ':'DHHA', 'ण':'NAE', 'त':'TA', 'थ':'THA','द':'DA', 'ध':'DHA', 
       'न':'NA','प':'PA', 'फ':'FA', 'ब':'B', 'भ':'BHA', 'म':'MA','य':'YA', 'र':'RA', 
       'ल':'L','व':'WA', 'स':'SA', 'ष':'SHHA', 'श':'SHA', 'ह':'HA', '्':'A', 
       'ऋ':'RI', 'ॠ':'RI','ऌ':'LI','ॐ':'OMS', 'ः':' ', 'ँ':'U', 
       'ं':'M', 'ृ':'RI', 'ा':'AA', 'ी':'II', 'ि':'I', 'े':'E', 'ै':'AI', 
       'ो':'O','ौ':'OU','ु' :'U','ू':'UU' } 

import unicodedata 
from functools import reduce 

def reducer(r, v): 
    if unicodedata.category(v) in ('Mc', 'Mn'): 
     r[-1] = r[-1] + v 
    else: 
     r.append(v) 
    return r 

with open('words_original.txt', mode='r',encoding="utf-8") as f: 
    with open ('alphabeths.txt', mode='w+', encoding='utf-8') as d: 
    with open('only_words.txt', mode='w+', encoding="utf-8") as e: 



      chunk_size = 4096 
      f_chunk = f.read(chunk_size) 

      while len(f_chunk)>0: 

       for word in f_chunk.split(): 


       for char in ['।', ',', '’', '‘', '?','#','1','2','3','4','0','5','6','7','8','9', 
           '१','२','३','४','५','.''६','७','८','९','०', '5','6','7','8','9','0','\ufeff']: 
        if char in word: 
         word = word.replace(char, '') 

       if word.strip(): 
        clean_list.append(word) 

       f_chunk = f.read(chunk_size) 

       for clean_word in clean_list: 


       test_word= reduce(reducer,clean_word,[]) 

       final_word= (''.join(test_word)) 
       dafuq.append(final_word) 
       print (final_word) 
    f_chunk = f.read(chunk_size) 

這是

words_original.txt

words_original.txt

堆棧跟蹤誤差

Traceback (most recent call last): 
    File "C:\Users\KUSHAL\Desktop\EARTHQUAKE_PYTHON\test.py", line 82, in <module> 
    test_word= reduce(reducer,clean_word,[]) 
    File "C:\Users\KUSHAL\Desktop\EARTHQUAKE_PYTHON\test.py", line 27, in reducer 
    r[-1] = r[-1] + v 
IndexError: list index out of range 
+2

刪除代碼中的所有不必要的註釋,運行代碼並向我們提供完整的堆棧跟蹤,以便我們提供幫助。 – thefourtheye

+0

@thefourtheye當然。完成了! – choman

回答

0

問題與一些Unicode字符躺在我正在測試它的文件。它在刪除它們之後運行。