使用Python的數組

-1

我有2個數組。 query_array和stem_array。 stem_array具有格式（單詞|同義詞）中的元素。 query_array包含單詞。使用Python的數組

我想檢查query_array是否包含stem_array的同義詞部分中包含的單詞，如果是，則將同義詞映射到實際單詞。

爲如：query_array（總統，真實性） stem_array（總統|總統，正宗|真實性，驗證）所以我最終陣列應該返回（總裁兼正品）。

請幫助我，因爲我是新來的蟒蛇，我有一個項目來完成。

謝謝。

來源

2012-01-31 Nerd

你的數據是否以這種格式字面表達？例如一個字符串「（president | presidential）」 – KobeJohn 2012-01-31 00:58:50

粘貼實際代碼。你所描述的「陣列」是沒有意義的。 – JBernardo 2012-01-31 00:59:38

@yakiimo：是的。它在字面上是以這種格式。 – Nerd 2012-01-31 01:00:06

我打算假設你完全迷失在這裏，並且一步一步地做，所以你可以想出如何在將來做到這一點。但它聽起來像你需要做一些python教程。所有這些都是未經測試的代碼。

獲取查詢在一個合理的格式：

query_string = query_array[1:-1] #remove the parentheses with slicing 
queries_with_whitespace = query_string.split(",") #split the string into a list 
queries = [query.strip() for query in queries_with_whitespace] #remove whitespace 
# queries = [item.strip() for item in query_array[1:-1].split(",")] #all in one

同爲同義詞。以下是您的一個詞幹串：

def stem_and_syns(unformatted_string): #unformatted string is your stem_array 
    stem_string = unformatted_string[1:-1] #same as before 
    stem, synonyms_string = stem_string.split("|") #split the stem and synonyms 
    stem = stem.strip() #clean the stem 
    synonyms = [synonym.strip() for synonym in synonym_string.split(",")] #same as before 
    return stem, synonyms

但是，您需要同義詞進行反向查找。你是否意識到，對於任何給定的詞，它可能是一個詞幹以及同義詞？而且任何一個詞都可以有多個詞幹？你需要弄清楚在這種情況下要做什麼。總之，這裏是反向查找：

stem_lookup = {} 
for stem_string in stem_strings #stem_strings is the set of all of your non-formatted stem strings 
    stem, synonyms = stem_and_syns(stem_string) 
    for synonym in synonyms: 
     #point all synonyms to a list of possible stems 
     stem_lookup.setdefault(synonym, []).append(stem)) #make a new list if this synonym not used yet

最後，從一開始查詢（再次，這使得一組是容易的，我的假設，但可能不符合你的需求）：

result = [stem_lookup.get(original,original) for original in queries] #uses original itself if it's not a synonym

來源

2012-01-31 01:28:38 KobeJohn

展開幹數組轉換成一個字典，讓每一個在字典中的鍵和值是其所有的同義詞列表：

 
stem_dict = { 
    'presidential': ['president'], 
    'president': ['presidential'], 
    'authentic': ['authenticity', 'authentication'], 
    'authenticity': ['authentic', 'authentication'], 
    'authentication': ['authentic', 'authenticity'], 
}

現在任何單詞的答案僅僅是一個stem_dict[word]。

看起來每個同義詞列表都是帶有分隔符的字符串，所以使用string.split將它們分開，以便將它們放入字典中。

來源

2012-01-31 01:04:56 Glenn

使用Python的數組

回答

相關問題