2013-03-14 121 views
0

請幫幫我!蟒蛇拉丁轉換器

我正在將多行文本文件轉換爲拉丁文。

例如:豬的拉丁語翻譯:這是一個例子。應該是:Histay siay naay xampleeay。

我需要任何標點符號留在應該是的位置(大多數情況下句尾) 我還需要任何以大寫字母開頭的單詞,以拉丁語版的大寫字母開頭,其餘字母小寫。

這是我的代碼:

def main(): 
    fileName= input('Please enter the file name: ') 

    validate_file(fileName) 
    newWords= convert_file(fileName) 
    print(newWords) 


def validate_file(fileName): 
    try: 
     inputFile= open(fileName, 'r') 
     inputFile.close() 
    except IOError: 
     print('File not found.') 


def convert_file(fileName): 
    inputFile= open(fileName, 'r') 
    line_string= [line.split() for line in inputFile] 

    for line in line_string: 
     for word in line: 
      endString= str(word[1:]) 
      them=endString, str(word[0:1]), 'ay' 
      newWords="".join(them) 
      return newWords 

我的文本文件是:

This is an example. 

My name is Kara! 

且程序返回:

Please enter the file name: piglatin tester.py 
hisTay 
siay 
naay 
xample.eay 
yMay 
amenay 
siay 
ara!Kay 
None 

如何讓他們在打印出來他們在線?而且我該如何處理標點問題和大寫?

回答

1

這是我對你的代碼進行修改。您應該考慮與nltk合作。它具有更強大的字標記化處理。

def main(): 
    fileName= raw_input('Please enter the file name: ') 

    validate_file(fileName) 
    new_lines = convert_file(fileName) 
    for line in new_lines: 
     print line 

def validate_file(fileName): 
    try: 
     inputFile= open(fileName, 'r') 
     inputFile.close() 
    except IOError: 
     print('File not found.') 

def strip_punctuation(line): 
    punctuation = '' 
    line = line.strip() 
    if len(line)>0: 
     if line[-1] in ('.','!','?'): 
      punctuation = line[-1] 
      line = line[:-1] 
    return line, punctuation 

def convert_file(fileName): 
    inputFile= open(fileName, 'r') 
    converted_lines = [] 
    for line in inputFile: 
     line, punctuation = strip_punctuation(line) 
     line = line.split() 
     new_words = [] 
     for word in line: 
      endString= str(word[1:]) 
      them=endString, str(word[0:1]), 'ay' 
      new_word="".join(them) 
      new_words.append(new_word) 
     new_sentence = ' '.join(new_words) 
     new_sentence = new_sentence.lower() 
     if len(new_sentence): 
      new_sentence = new_sentence[0].upper() + new_sentence[1:] 
     converted_lines.append(new_sentence + punctuation) 
    return converted_lines 
+0

謝謝!然而,我得到這個錯誤:文件「/Users/tinydancer9454/Documents/python/pigLatinFile.py」,第17行,在主要 strip_punc(行) UnboundLocalError:本地變量'行'之前引用 – tinydancer9454 2013-03-14 03:21:10

+0

也是什麼new_lines引用? – tinydancer9454 2013-03-14 03:26:11

+0

new_lines指從英文轉換成PigLatin的行。 – ChrisGuest 2013-03-14 03:30:05

0

我做的工作除了標點符號。我仍在考慮解決方案。這裏是我的代碼:

def convert_file(fileName): 
    inputFile = open(fileName,'r') 
    punctuations = ['.',',','!','?',':',';'] 
    newWords = [] 
    linenum = 1 

    for line in inputFile: 
     line_string = line.split() 
     for word in line_string: 
      endString= str(word[1]).upper()+str(word[2:]) 
      them=endString, str(word[0:1]).lower(), 'ay' 
      word = ''.join(them) 
      wordAndline = [word,linenum] 
      newWords.append(wordAndline) 
     linenum +=1 
    return newWords 

這是不同的,它返回字和文件中的行。

['Histay', 1], ['Siay', 1], ['Naay', 1], ['Xample.eay', 1], ['Ymay', 3], ['Amenay', 3], ['Siay', 3], ['Ara!kay', 3]