2011-06-15 55 views
16

在我的工作場所,我偶然發現了以下問題,我被要求解決。雖然不是絕對需要,但解決方案是優選的一個具有挑戰性的算法問題

有一個包含故事集的數據庫,每個故事都有一組與其關聯的主題。主題存儲在表單(storyid,topicid)的單獨表格中。

問題是如何理想地選擇5個主題(或者至少2個,如果5個是不可能的),使得每個主題有2個故事(或1,如果2是不可能的)在其他選擇中不重複主題。該算法還必須返回與每個主題相關的恰當故事。

這實際上是一個NP完全問題,沒有有效的解決方案,不僅僅是簡單列舉所有可能性,還是它有一個有效的解決方案?

如果它沒有一個有效的解決方案,請嘗試證明它(儘管不是絕對必要的)。

如果它確實有一個有效的解決方案,請善待和張貼。

非常感謝,

安東

+1

你想用什麼語言來完成? – CraigW 2011-06-15 21:13:40

+3

這似乎是一個主題和故事之間的二分圖上的一種匹配問題。目標是找到一個解決方案還是列出所有可能的解決方案? – hardmath 2011-06-15 21:17:53

+0

@CraigW:語言是PHP和MySQL。 – akanevsky 2011-06-15 21:18:28

回答

5

問題的更一般的版本是選擇所有主題(或儘可能多的p可能的)兩個故事,以便同一個故事從來沒有被選擇爲兩個不同的主題。

馬克帶S 故事內容S,並以T 的主題...牛逼ñ。重複每一個主題,那就是引入新的故事T「 ... T」 ñ,其中T」 其S Ĵ當且僅當T 包含它。現在可以通過這種方式來解釋問題:爲所有主題選擇不同的故事(或儘可能多)。一旦你有你的主題故事對,你可以再次加入重複的主題,每個主題將有兩個故事。

發現可能的最大數量對的問題,而從未選擇任何元素兩次的問題稱爲最大二分配匹配問題。 (你可以認爲它是從bipartite graph中選擇最大的非連接邊的數量。)有一個簡單的算法,稱爲擴充路徑,它在O((m + n)e)個步驟中解決它(e是邊的數量)和一些更復雜的(例如Hopcroft–Karp algorithm),它們在O((m + n)^ 2.5)步驟左右解決它。增強路徑算法包括搜索圖中「未交替」路徑,其中路徑的第一條邊未被選中,第二條路是第三條不是,依此類推,而不是反轉路徑上的選擇。這可能可能適用於您的情況,而無需實際分割和加入主題。

這是一個有點矯枉過正,因爲這將返回所有題目兩個故事,而不是隻有五個;如果只需要爲有限數量的主題查找故事,則可以做得更好。 (除了一些邊緣情況,您可以選擇故事數量最多的五個主題,丟棄未包含的故事,然後運行該算法。)無論如何,它表明問題遠不是NP -硬。

+0

爲什麼我需要複製每一個主題?你能否詳細說明你的算法是如何工作的? – akanevsky 2011-06-16 01:16:43

+0

@akanevsky:我更詳細地解釋了它。 – Tgr 2011-06-16 06:19:14

+0

+1「無論如何,這表明問題遠非NP難。」 – csl 2011-06-16 06:22:44

-3

如果wona選擇類似不同的故事類似5個話題,因爲我明白: 這樣就可以使一個連接的2個表,並在查詢中使用top 5與在條件主題標題=

「你想的話題,」如果不幫助好心說清楚,我>>>

+0

我不僅需要5個主題,還需要5個主題,每個主題有2個故事,其他4個主題都不重複。 – akanevsky 2011-06-15 21:18:07

+0

你真的讓我感到困惑,,你可以發佈信息和例子如:story1 --topic1,標題2,topic3 ... story2 ---標題2,topic5 .... – Tamer 2011-06-15 21:34:49

+0

爲什麼你需要具體的例子?我有一個表格(storyid,title)和一個表格(storyid,topicid)。我想向用戶展示5個主題,當他選擇其中2個主題時,我不希望他在兩個主題下看到相同的故事。現在更清楚了嗎? – akanevsky 2011-06-15 21:42:03

0

試着將這個以滿足您的需求:

SELECT topic, story 
FROM story_topic 
WHERE story IN (SELECT story FROM story_topic GROUP BY story HAVING COUNT(*) = 1); 

這裏的關鍵是要知道什麼樣的故事只發生只有一個話題。您可能需要預先計算消除子選擇的主題數量。

+0

可能是所有故事都出現在多個主題中。 – akanevsky 2011-06-16 01:08:51

+0

你原來的問題會如何處理?也許我只是誤解了這個問題。 – jond3k 2011-06-16 08:38:52

0

這個怎麼樣? (如果我理解你的問題)

(我沒有真正運行它 - 只是一個想法 - 所以......可能是錯誤,或者我可能明顯錯過了某些東西但是 - 目前,我疲倦的頭腦認爲它的工作:)

$num_topics = 5; 
$stories_per = 5; 
$stories = array(); //array to store story ids 

//select 5 topics 
$query = mysql_query("SELECT * FROM topics ORDER BY RAND() LIMIT ".$num_topics); 

//run repeat as many times as you want stories 
for($i=0; $i<$stories_per; $i++) { 

    //repeat through each selected topic 
    while($row = mysql_fetch_array($query)) { 

     $q_addon = ""; 
     foreach($stories as $value) { 
      $q_addon .= "id <> '".$value."' AND "; 
     } 

     //find a story not yet chosen for each topic 
     $q = mysql_query("SELECT storyid FROM stories_topics WHERE ".$q_addon." topicid='".$row['id']."' LIMIT 1"); 

     //add that id to your $stories array 
     $tmp_id = mysql_result($q,0,'storyid'); 
     array_push($stories, $tmp_id); 

    } 
} 
3

假設我們有涉及TopicID和StoryID, 與一對形成化合物主鍵列的SQL表TopicStory。

的目標是找到某種一套TopicID的組成。發佈的問題最多需要五個 。深度優先搜索 這些概述如下。

隨着搜索的深度限定在五,指出問題不能 比多項式複雜性更差。然而,要求爲最大的話題集合,可以與像 約束(選擇每個主題不相關的 任何其他選擇的主題至少兩層)可以發現一個普遍問題 可能是NP完全問題。

一詞的使用「搜索」中提出了一種回溯算法。在 以下,我們通過嵌套循環實現了回溯,其中每個循環是 ,由其外部的循環定義和參數化。

在我們給出詳細的細節之前,可能會激發描述 「蠻力」的方法,接下來可以更容易地理解更復雜的方法 。

BRUTE_FORCE: 

Generate all possible subsets of five topics. 
Test each of these sets for feasibility (each topic has 
at least two stories unrelated to any of the other topics). 

我們的深度優先搜索的草圖假設議題一共有排序, 例如按照TopicID的整數值排序。這允許生成主題 而不重複(由於主題的排列)。

NESTED_LOOPS: 

(Loop_1) Select into List_1 all topics with at least two stories. 
     Iterate through List_1, choosing the first topic %T1%. 
     PASS control into Loop_2. 
     CONTINUE in Loop_1. 
     If the end of List_1 is reached, EXIT with failure. 

(Loop_2) Select into List_2 all topics > %T1% with at least two 
     stories unrelated to %T1%. 
     Iterate through List_2, choosing the second topic %T2%. 
     If topic %T1% still has at least two stories unrelated 
     to %T2%, PASS control into Loop_3. 
     CONTINUE in Loop_2. 
     If the end of List_2 is reached, go BACK to Loop_1. 

(Loop 3) Select into List_3 all topics > %T2% with at least two 
     stories unrelated to %T1% or %T2%. 
     Iterate through List_3, choosing the third topic %T3%. 
     If topic %T1% still has at least two stories unrelated 
     to %T2% or %T3%, 
     and topic %T2% still has at least two stories unrelated 
     to %T1% or %T3%, PASS control into Loop_4. 
     CONTINUE in Loop_3. 
     If the end of List_3 is reached, go BACK to Loop_2. 

(Loop 4) Select into List_4 all topics > %T3% with at least two 
     stories unrelated to %T1%, %T2%, or %T3%. 
     Iterate through List_4, choosing the fourth topic %T4%. 
     If topic %T1% still has at least two stories unrelated 
     to %T2%, %T3%, or %T4%, 
     and topic %T2% still has at least two stories unrelated 
     to %T1%, %T3%, or %T4%, 
     and topic %T3% still has at least two stories unrelated 
     to %T1%, %T2%, or %T4%, PASS control into Loop_5. 
     CONTINUE in Loop_4. 
     If the end of List_4 is reached, go BACK to Loop_3. 

(Loop 5) Select into List_5 all topics > %T4% with at least two 
     stories unrelated to %T1%, %T2%, %T3%, or %T4%. 
     Iterate through List_5, choosing the fifh topic %T5%. 
     If topic %T1% still has at least two stories unrelated 
     to %T2%, %T3%, %T4%, or %T5%, 
     and topic %T2% still has at least two stories unrelated 
     to %T1%, %T3%, %T4%, or %T5%, 
     and topic %T3% still has at least two stories unrelated 
     to %T1%, %T2%, %T4%, or %T5%, 
     and topic %T4% still has at least two stories unrelated 
     to %T1%, %T2%, %T3%, or %T5%, EXIT with success 
     returning five topics %T1%, %T2%, %T3%, %T4%, and %T5%. 
     CONTINUE in Loop_5. 
     If the end of List_5 is reached, go BACK to Loop_4. 

使用「選擇」在每個嵌套循環的開口是爲了喚起 SQL查詢的可能性,以實現許多邏輯的。對於 例如最外層循環基本上是剛開結果集中 此查詢:

SELECT TS1.TopicID, Count(*) 
From TopicStory TS1 
Group By TS1.TopicID 
Having Count(*) > 1 

內部循環的相應的列表可以類似 通過SQL查詢取決於在 所選主題的參數值來構造外環。爲了說明沒有不必要的重複讓我們跳 權內部循環,並給出List_5適當的查詢:

SELECT TS5.TopicID, Count(*) 
From TopicStory TS5 
Where TS5.TopicID > %T4% 
    and NOT EXISTS (SELECT * 
         From TopicStory TSX 
         Where TSX.TopicID in (%T1%,%T2%,%T3%,%T4%) 
         and TSX.StoryID = TS5.StoryID 
        ) 
Group By TS5.TopicID 
Having Count(*) > 1 

這之後將在List_5檢查%T5%產生 計數至少兩個故事的左對於主題%T1%:

SELECT Count(*) 
From TopicStory TZ1 
Where TZ1.TopicID = %T1% 
    and NOT EXISTS (SELECT * 
        From TopicStory TX1 
        Where TX1.StoryID = TZ1.StoryID 
         and TX1.TopicID in (%T2%,%T3%,%T4%,TS5.TopicID) 
       ) 

並且對於其他先前話題選擇進行了必要的修改。

雖然它可能會降低性能不可接受,限制相關%T5%主題額外的邏輯 (使前面的話題選擇 仍然保留至少兩層的選擇)可以推入一個查詢。 它應該是這樣的:

/* 
    Given %T1%, %T2%, %T3$, and %T4% from queries above, find all topics %T5% > %T4% 
    with at least 2 stories not related to %T1%, %T2%, %T3%, or %T4% and such that 
    %T1% still has at least 2 stories not related to %T2%, %T3%, %T4%, or %T5% and 
    %T2% still has at least 2 stories not related to %T1%, %T3%, %T4%, or %T5% and 
    %T3% still has at least 2 stories not related to %T1%, %T2%, %T4%, or %T5% and 
    %T4% still has at least 2 stories not related to %T1%, %T2%, %T3%, or %T5% 
*/ 

SELECT TS5.TopicID, Count(*) 
From TopicStory TS5 
Where TS5.TopicID > %T4% 
    and NOT EXISTS (SELECT * 
         From TopicStory TSX 
         Where TSX.TopicID in (%T1%,%T2%,%T3%,%T4%) 
         and TSX.StoryID = TS5.StoryID 
        ) 
    and (SELECT Count(*) 
      From TopicStory TZ1 
      Where TZ1.TopicID = %T1% 
      and NOT EXISTS (SELECT * 
           From TopicStory TX1 
           Where TX1.StoryID = TZ1.StoryID 
           and TX1.TopicID in (%T2%,%T3%,%T4%,TS5.TopicID) 
          ) 
     ) > 1 
    and (SELECT Count(*) 
      From TopicStory TZ2 
      Where TZ2.TopicID = %T2% 
      and NOT EXISTS (SELECT * 
           From TopicStory TX2 
           Where TX2.StoryID = TZ2.StoryID 
           and TX2.TopicID in (%T1%,%T3%,%T4%,TS5.TopicID) 
          ) 
     ) > 1 
    and (SELECT Count(*) 
      From TopicStory TZ3 
      Where TZ3.TopicID = %T3% 
      and NOT EXISTS (SELECT * 
           From TopicStory TX3 
           Where TX3.StoryID = TZ3.StoryID 
           and TX3.TopicID in (%T1%,%T2%,%T4%,TS5.TopicID) 
          ) 
     ) > 1 
    and (SELECT Count(*) 
      From TopicStory TZ4 
      Where TZ4.TopicID = %T4% 
      and NOT EXISTS (SELECT * 
           From TopicStory TX3 
           Where TX3.StoryID = TZ3.StoryID 
           and TX3.TopicID in (%T1%,%T2%,%T3%,TS5.TopicID) 
          ) 
     ) > 1 
Group By TS5.TopicID 
Having Count(*) > 1 

的功能集的MySQL是一個進展中的工作,所以可以想象在存儲過程中的高效 實現是可能的,其中光標可以採取主題列出的 作用。然而,如果 「遊標」是外部管理列表(例如PHP),並且數據庫查詢 保持儘可能簡單,我會對自己的良好業績充滿信心。