2015-11-08 152 views
0

以下我的前面的問題,我試圖處理一個代碼來返回一個字符串,如果某個列表中的搜索項是在一個字符串中返回如下。如何遍歷一個python列表並比較一個字符串或另一個列表中的項目

import re 
from nltk import tokenize 
from nltk.tokenize import sent_tokenize 
def foo(): 
    List1 = ['risk','cancer','ocp','hormone','OCP',] 
    txt = "Risk factors for breast cancer have been well characterized. Breast cancer is 100 times more frequent in women than in men.\ 
    Factors associated with an increased exposure to estrogen have also been elucidated including early menarche, late menopause, later age\ 
    at first pregnancy, or nulliparity. The use of hormone replacement therapy has been confirmed as a risk factor, although mostly limited to \ 
    the combined use of estrogen and progesterone, as demonstrated in the WHI (2). Analysis showed that the risk of breast cancer among women using \ 
    estrogen and progesterone was increased by 24% compared to placebo. A separate arm of the WHI randomized women with a prior hysterectomy to \ 
    conjugated equine estrogen (CEE) versus placebo, and in that study, the use of CEE was not associated with an increased risk of breast cancer (3).\ 
    Unlike hormone replacement therapy, there is no evidence that oral contraceptive (OCP) use increases risk. A large population-based case-control study \ 
    examining the risk of breast cancer among women who previously used or were currently using OCPs included over 9,000 women aged 35 to 64 \ 
    (half of whom had breast cancer) (4). The reported relative risk was 1.0 (95% CI, 0.8 to 1.3) among women currently using OCPs and 0.9 \ 
    (95% CI, 0.8 to 1.0) among prior users. In addition, neither race nor family history was associated with a greater risk of breast cancer among OCP users." 
    words = txt 
    corpus = " ".join(words).lower() 
    sentences1 = sent_tokenize(corpus) 
    a = [" ".join([sentences1[i-1],j]) for i,j in enumerate(sentences1) if [item in List1] in word_tokenize(j)] 


    for i in a: 
     print i,'\n','\n' 

foo() 

問題是,蟒蛇IDLE不打印任何東西。我可能做錯了什麼。它的作用是運行代碼,我得到這個

> >

回答

1

你的問題我不太清楚,所以請糾正我,如果我得到這個錯誤。你是否嘗試將關鍵字列表(在list1中)與文本(在txt中)進行匹配?也就是說,

  • 對於每個關鍵字列表1
  • 不要反對TXT每個句子匹配。
  • 打印句子,如果他們匹配?

不是寫一個複雜的正則表達式來解決你的問題,我已經把它分解成了兩部分。

首先我把整個文本分成一個句子列表。然後寫簡單的正則表達式來遍歷每個句子。這種方法的麻煩在於效率不高,但嘿它解決了你的問題。

希望這一小塊代碼可以幫助您指導真正的解決方案。

def foo(): 
    List1 = ['risk','cancer','ocp','hormone','OCP',] 
    txt = "blah blah blah - truncated" 
    words = txt 

    matches = [] 
    sentences = re.split(r'\.', txt) 
    keyword = List1[0] 
    pattern = keyword 
    re.compile(pattern) 

    for sentence in sentences: 
     if re.search(pattern, sentence): 
      matches.append(sentence) 

    print("Sentence matching the word (" + keyword + "):") 
    for match in matches: 
     print (match) 

---------生成隨機數-----

from random import randint 

List1 = ['risk','cancer','ocp','hormone','OCP',] 
print(randint(0, len(List1) - 1)) # gives u random index - use index to access List1 
+0

你做了我很多的青睞!謝謝!!有用。雖然我想以一種隨機選擇項目而不是例如從List1中選擇項目的方式工作, List1 [0]或List1 [3] – wakamdr

+0

也許嘗試: 隨機導入randint。我已更新解決方案以包含示例代碼。 –

+0

太好了。適用於關鍵字= List1 [(randint(0,len(List1) - 1))] ....也適用於while循環 – wakamdr

相關問題