2010-12-12 63 views
1

我m對於編程來說相當新穎,所以我確信有一種更好的方式來構成這個問題,但我正在嘗試創建一個個人書籤程序。給定多個網址,每個網址都有一個按相關性排序的標籤列表,我希望能夠創建一個由一系列標籤組成的搜索,這些標籤返回最相關的url列表。下面我的第一個解決方案是讓第一個標籤的值爲1,第二個爲&,讓python list sort函數完成剩下的工作。 2個問題:Python的排序問題 - 給出的列表[ '網址', '標籤1', '標籤2',..] S和搜索規範[ '標籤3', '標籤1',...],返回相關的URL列表

1)是否有更加優雅/有效的方式來做到這一點(令我難堪!) 2)通過給定上述輸入問題的相關性排序的任何其他一般方法?

非常感激。

# Given a list of saved urls each with a corresponding user-generated taglist 
# (ordered by relevance), the user enters a "search" list-of-tags, and is 
# returned a sorted list of urls. 

# Generate sample "content" linked-list-dictionary. The rationale is to 
# be able to add things like 'title' etc at later stages and to 
# treat each url/note as in independent entity. But a single dictionary 
# approach like "note['url1']=['b','a','c','d']" might work better? 

content = [] 
note = {'url':'url1', 'taglist':['b','a','c','d']} 
content.append(note) 
note = {'url':'url2', 'taglist':['c','a','b','d']} 
content.append(note) 
note = {'url':'url3', 'taglist':['a','b','c','d']} 
content.append(note) 
note = {'url':'url4', 'taglist':['a','b','d','c']} 
content.append(note) 
note = {'url':'url5', 'taglist':['d','a','c','b']} 
content.append(note) 

# An example search term of tags, ordered by importance 
# I'm using a dictionary with an ordinal number system 
# This seems clumsy 
search = {'d':1,'a':2,'b':3} 

# Create a tagCloud with one entry for each tag that occurs 
tagCloud = [] 
for note in content: 
    for tag in note['taglist']: 
     if tagCloud.count(tag) == 0: 
      tagCloud.append(tag) 

# Create a dictionary that associates an integer value denoting 
# relevance (1 is most relevant etc) for each existing tag 

d={}    
for tag in tagCloud: 
    try: 
     d[tag]=search[tag] 
    except KeyError: 
     d[tag]=100 

# Create a [[relevance, tag],[],[],...] result list & sort 
result=[]  
for note in content: 
    resultNote=[] 
    for tag in note['taglist']: 
     resultNote.append([d[tag],tag]) 
    resultNote.append(note['url']) 
    result.append(resultNote) 
result.sort() 

# Remove the relevance values & recreate a list containing 
# the url string followed by corresponding tags. 
# Its so hacky i've forgotten how it works! 
# It's mostly for display, but suggestions on "best-practice" 
# intermediate-form data storage? 

finalResult=[] 
for note in result: 
    temp=[] 
    temp.append(note.pop()) 
    for tag in note: 
     temp.append(tag[1]) 
    finalResult.append(temp) 

print "Content: ", content 
print "Search: ", search 
print "Final Result: ", finalResult 

回答

2

1)是否有這樣做的更優雅/有效的方式(讓我難堪!)

當然可以。基本思路:不要試圖告訴Python該做什麼,只要問它想要什麼。

content = [ 
    {'url':'url1', 'taglist':['b','a','c','d']}, 
    {'url':'url2', 'taglist':['c','a','b','d']}, 
    {'url':'url3', 'taglist':['a','b','c','d']}, 
    {'url':'url4', 'taglist':['a','b','d','c']}, 
    {'url':'url5', 'taglist':['d','a','c','b']} 
] 

search = {'d' : 1, 'a' : 2, 'b' : 3} 

# We can create the tag cloud like this: 
# tagCloud = set(sum((note['taglist'] for note in content), [])) 
# But we don't actually need it: instead, we'll just use a default value 
# when looking things up in the 'search' dict. 

# Create a [[relevance, tag],[],[],...] result list & sort 
result = sorted(
    [ 
     [search.get(tag, 100), tag] 
     for tag in note['taglist'] 
    ] + [[note['url']]] 
    # The result will look like [ [relevance, tag],... , [url] ] 
    # Note that the url is wrapped in a list too. This makes the 
    # last processing step easier: we just take the last element of 
    # each nested list. 
    for note in content 
) 

# Remove the relevance values & recreate a list containing 
# the url string followed by corresponding tags. 
finalResult = [ 
    [x[-1] for x in note] 
    for note in result 
] 

print "Content: ", content 
print "Search: ", search 
print "Final Result: ", finalResult 
+0

很好,謝謝。解釋性評論有很大幫助。歡呼聲 – 2010-12-12 04:13:06

+0

@大衛:如果答案是根據你的要求,認爲它是禮貌的,並接受它。 – user225312 2010-12-12 04:49:28

+0

哈哈是的,它不會讓我upvote&我錯過了透明的小勾選框。 – 2010-12-12 05:06:28

0

我建議你也給一個權重給每個標籤,這取決於它是多麼難得的(例如,「狼蛛」標籤將重量超過「自然」tag¹更多)。對於給定的URL,罕見的標記,是常見的其他網址應當標註較強的相關性,而給定的URL 存在於另一網址經常使用的標籤應當標註的相關性。

可以很容易地轉換我上面描述的作爲每隔URL數值相關的計算規則。

¹除非您的所有網址都與「狼蛛」相關,當然:)

+0

是的,有趣的做法。乾杯 – 2010-12-14 19:53:49