2013-03-10 132 views
0

我已經閱讀了不止一次的「unicode on python 2.7 how-to」並且徹底地瀏覽了這個論壇,但沒有發現和嘗試讓我的程序工作。raw_inputting Unicode字符串

它應該將dictionary.com條目轉換成例句集合以及單詞 - 發音對。然而,它從一開始就失敗了:IPA(即unicode)字符在輸入後立即轉換爲亂碼。

# -*- coding: utf-8 -*- 

""" HERE'S HOW A TYPICAL DICTIONARY.COM ENTRY LOOKS LIKE 
white·wash 
/ˈʰwaɪtˌwɒʃ, -ˌwɔʃ, ˈwaɪt-/ Show Spelled 
noun 
1. 
a composition, as of lime and water or of whiting, size, and water, used for whitening walls, woodwork, etc. 
2. 
anything, as deceptive words or actions, used to cover up or gloss over faults, errors, or wrongdoings, or absolve a wrongdoer from blame. 
3. 
Sports Informal. a defeat in which the loser fails to score. 
verb (used with object) 
4. 
to whiten with whitewash. 
5. 
to cover up or gloss over the faults or errors of; absolve from blame. 
6. 
Sports Informal. to defeat by keeping the opponent from scoring: The home team whitewashed the visitors eight to nothing. 
""" 

def wdefinp(): #word definition input 
    wdef=u'' 
    emptylines=0 
    print '\nREADY\n\n' 
    while True: 
     cinp=raw_input() #current input line 
     if cinp=='': 
      emptylines += 1 
      if emptylines >= 3: #breaking out by 3xEnter 
       wdef=wdef[:-2] 
       return wdef 
     else: 
      emptylines = 0 
     wdef=wdef + '\n' + cinp 
    return wdef 

wdef=wdefinp() 
print wdef.decode('utf-8') 

這產生了: white·洗 /EE°waÉŞtËŚwÉ'Ę,-ËŚwÉ」 E,ËwaÉŞt-/顯示拼寫 ...

任何幫助將是讚賞。

+0

適合我,從eclipse,python 2.7和你的測試數據運行 – Vorsprung 2013-03-10 19:11:59

回答

0

好,我成功地複製了幾個錯誤的與您的程序

首先,如果我在終端運行它並粘貼示例文本中我將在這條線得到一個錯誤(對不起我的行號不匹配你的):

File "unicod.py", line 22, in wdefinp 
    wdef=wdef + '\n' + cinp 
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 5: ordinal not in range(128) 

爲了解決這個問題我以前的答案從這個計算器的問題:How to read Unicode input and compare Unicode strings in Python?

固定線路

cinp = raw_input().decode(sys.stdin.encoding) 

基本上你需要知道輸入編碼,然後轉換爲UTF-8可能

一旦被固定的下一個問題是類似的問題

File "unicod.py", line 28, in <module> 
    print wdef.decode('utf-8') 
    File "/usr/lib/python2.7/encodings/utf_8.py", line 16, in decode 
    return codecs.utf_8_decode(input, errors, True) 
UnicodeEncodeError: 'ascii' codec can't encode character u'\xb7' in position 6: ordinal not in range(128) 

因爲從功能回來的數據已經utf8「雙解碼」它不起作用。只需刪除「.decode('utf8')」,它工作正常