2014-11-02 70 views
0

在代碼中有一個名爲clean_up的助手函數,下面是我的代碼。我想知道我需要修復,添加或刪除它以使其工作。代碼運行但不符合合同中的前提條件

def clean_up(s): 
    """ (str) -> str 

    Return a new string based on s in which all letters have been 
    converted to lowercase and punctuation characters have been stripped 
    from both ends. Inner punctuation is left untouched. 

    >>> clean_up('Happy Birthday!!!') 
    'happy birthday' 
    >>> clean_up("-> It's on your left-hand side.") 
    " it's on your left-hand side" 
    """ 

    punctuation = """!"',;:.-?)([]<>*#\n\t\r""" 
    result = s.lower().strip(punctuation) 
    return result 


########## Complete the following functions. ############ 

def type_token_ratio(text): 
    """ (list of str) -> float 

    Precondition: text is non-empty. Each str in text ends with \n and 
    text contains at least one word. 

    Return the Type Token Ratio (TTR) for this text. TTR is the number of 
    different words divided by the total number of words. 

    >>> text = ['James Fennimore Cooper\n', 'Peter, Paul, and Mary\n', 
     'James Gosling\n'] 
    >>> type_token_ratio(text) 
    0.8888888888888888 
    """ 

    # To do: Fill in this function's body to meet its specification. 

    distinctwords = dict() 
    words = 0 
    for line in text.splitlines(): 
     line = line.strip().split() 
     for word in line: 
      words+=1 
      if word in distinctwords: 
       distinctwords[word]+=1 
      else: 
       distinctwords[word]=1 
    TTR= len(distinctwords)/words 
    return TTR 
+0

什麼問題? – 2014-11-02 22:52:02

+0

我問我的老師,但他沒有向我解釋,他說這樣做符合前提條件,但我的代碼運行,所以我很困惑。 – JerryMichaels 2014-11-02 22:56:41

回答

0

您的代碼將不能運行,for line in text.splitlines()試圖拆分列表,你需要遍歷傳遞的話叫text的列表,使用collections.defaultdict也將更加高效:

def type_token_ratio(text): 
    from collections import defaultdict 
    distinctwords = defaultdict(int) 
    for words in text: # get each string 
     words = clean_up(words) # clean the string 
     for word in words.split(): # split into individual words 
      distinctwords[word] += 1 # increase the count for each word 
    TTR = len(distinctwords)/sum(distinctwords.values()) # sum(distinctwords.values()) will give total amount of words 
    return TTR 
+0

不用擔心,不客氣 – 2014-11-02 23:34:07