一個具有挑戰性的算法問題

在我的工作場所，我偶然發現了以下問題，我被要求解決。雖然不是絕對需要，但解決方案是優選的一個具有挑戰性的算法問題

有一個包含故事集的數據庫，每個故事都有一組與其關聯的主題。主題存儲在表單（storyid，topicid）的單獨表格中。

問題是如何理想地選擇5個主題（或者至少2個，如果5個是不可能的），使得每個主題有2個故事（或1，如果2是不可能的）在其他選擇中不重複主題。該算法還必須返回與每個主題相關的恰當故事。

這實際上是一個NP完全問題，沒有有效的解決方案，不僅僅是簡單列舉所有可能性，還是它有一個有效的解決方案？

如果它沒有一個有效的解決方案，請嘗試證明它（儘管不是絕對必要的）。

如果它確實有一個有效的解決方案，請善待和張貼。

非常感謝，

安東

來源

2011-06-15 akanevsky

你想用什麼語言來完成？ – CraigW 2011-06-15 21:13:40

這似乎是一個主題和故事之間的二分圖上的一種匹配問題。目標是找到一個解決方案還是列出所有可能的解決方案？ – hardmath 2011-06-15 21:17:53

@CraigW：語言是PHP和MySQL。 – akanevsky 2011-06-15 21:18:28

問題的更一般的版本是選擇所有主題（或儘可能多的p可能的）兩個故事，以便同一個故事從來沒有被選擇爲兩個不同的主題。

馬克帶S 故事內容S_米，並以T 的主題...牛逼_ñ。重複每一個主題，那就是引入新的故事T「 ... T」 _ñ，其中T」 _我其S _Ĵ當且僅當T _我包含它。現在可以通過這種方式來解釋問題：爲所有主題選擇不同的故事（或儘可能多）。一旦你有你的主題故事對，你可以再次加入重複的主題，每個主題將有兩個故事。

發現可能的最大數量對的問題，而從未選擇任何元素兩次的問題稱爲最大二分配匹配問題。（你可以認爲它是從bipartite graph中選擇最大的非連接邊的數量。）有一個簡單的算法，稱爲擴充路徑，它在O（（m + n）e）個步驟中解決它（e是邊的數量）和一些更復雜的（例如Hopcroft–Karp algorithm），它們在O（（m + n）^ 2.5）步驟左右解決它。增強路徑算法包括搜索圖中「未交替」路徑，其中路徑的第一條邊未被選中，第二條路是第三條不是，依此類推，而不是反轉路徑上的選擇。這可能可能適用於您的情況，而無需實際分割和加入主題。

這是一個有點矯枉過正，因爲這將返回所有題目兩個故事，而不是隻有五個;如果只需要爲有限數量的主題查找故事，則可以做得更好。（除了一些邊緣情況，您可以選擇故事數量最多的五個主題，丟棄未包含的故事，然後運行該算法。）無論如何，它表明問題遠不是NP -硬。

來源

2011-06-15 23:39:10 Tgr

爲什麼我需要複製每一個主題？你能否詳細說明你的算法是如何工作的？ – akanevsky 2011-06-16 01:16:43

@akanevsky：我更詳細地解釋了它。 – Tgr 2011-06-16 06:19:14

+1「無論如何，這表明問題遠非NP難。」 – csl 2011-06-16 06:22:44

-3

如果wona選擇類似不同的故事類似5個話題，因爲我明白：這樣就可以使一個連接的2個表，並在查詢中使用top 5與在條件主題標題=

「你想的話題，」如果不幫助好心說清楚，我>>>

來源

2011-06-15 21:16:02 Tamer

我不僅需要5個主題，還需要5個主題，每個主題有2個故事，其他4個主題都不重複。 – akanevsky 2011-06-15 21:18:07

你真的讓我感到困惑,,你可以發佈信息和例子如：story1 --topic1，標題2，topic3 ... story2 ---標題2，topic5 .... – Tamer 2011-06-15 21:34:49

爲什麼你需要具體的例子？我有一個表格（storyid，title）和一個表格（storyid，topicid）。我想向用戶展示5個主題，當他選擇其中2個主題時，我不希望他在兩個主題下看到相同的故事。現在更清楚了嗎？ – akanevsky 2011-06-15 21:42:03

試着將這個以滿足您的需求：

SELECT topic, story 
FROM story_topic 
WHERE story IN (SELECT story FROM story_topic GROUP BY story HAVING COUNT(*) = 1);

這裏的關鍵是要知道什麼樣的故事只發生只有一個話題。您可能需要預先計算消除子選擇的主題數量。

來源

2011-06-15 23:41:56 jond3k

可能是所有故事都出現在多個主題中。 – akanevsky 2011-06-16 01:08:51

你原來的問題會如何處理？也許我只是誤解了這個問題。 – jond3k 2011-06-16 08:38:52

這個怎麼樣？（如果我理解你的問題）

（我沒有真正運行它 - 只是一個想法 - 所以......可能是錯誤，或者我可能明顯錯過了某些東西但是 - 目前，我疲倦的頭腦認爲它的工作:)

$num_topics = 5; 
$stories_per = 5; 
$stories = array(); //array to store story ids 

//select 5 topics 
$query = mysql_query("SELECT * FROM topics ORDER BY RAND() LIMIT ".$num_topics); 

//run repeat as many times as you want stories 
for($i=0; $i<$stories_per; $i++) { 

    //repeat through each selected topic 
    while($row = mysql_fetch_array($query)) { 

     $q_addon = ""; 
     foreach($stories as $value) { 
      $q_addon .= "id <> '".$value."' AND "; 
     } 

     //find a story not yet chosen for each topic 
     $q = mysql_query("SELECT storyid FROM stories_topics WHERE ".$q_addon." topicid='".$row['id']."' LIMIT 1"); 

     //add that id to your $stories array 
     $tmp_id = mysql_result($q,0,'storyid'); 
     array_push($stories, $tmp_id); 

    } 
}

來源

2011-06-16 00:17:01 Dave

假設我們有涉及TopicID和StoryID，與一對形成化合物主鍵列的SQL表TopicStory。

的目標是找到某種一套TopicID的組成。發佈的問題最多需要五個。深度優先搜索這些概述如下。

隨着搜索的深度限定在五，指出問題不能比多項式複雜性更差。然而，要求爲最大的話題集合，可以與像約束（選擇每個主題不相關的任何其他選擇的主題至少兩層）可以發現一個普遍問題可能是NP完全問題。

一詞的使用「搜索」中提出了一種回溯算法。在以下，我們通過嵌套循環實現了回溯，其中每個循環是，由其外部的循環定義和參數化。

在我們給出詳細的細節之前，可能會激發描述「蠻力」的方法，接下來可以更容易地理解更復雜的方法。

BRUTE_FORCE: 

Generate all possible subsets of five topics. 
Test each of these sets for feasibility (each topic has 
at least two stories unrelated to any of the other topics).

我們的深度優先搜索的草圖假設議題一共有排序，例如按照TopicID的整數值排序。這允許生成主題而不重複（由於主題的排列）。

NESTED_LOOPS: 

(Loop_1) Select into List_1 all topics with at least two stories. 
     Iterate through List_1, choosing the first topic %T1%. 
     PASS control into Loop_2. 
     CONTINUE in Loop_1. 
     If the end of List_1 is reached, EXIT with failure. 

(Loop_2) Select into List_2 all topics > %T1% with at least two 
     stories unrelated to %T1%. 
     Iterate through List_2, choosing the second topic %T2%. 
     If topic %T1% still has at least two stories unrelated 
     to %T2%, PASS control into Loop_3. 
     CONTINUE in Loop_2. 
     If the end of List_2 is reached, go BACK to Loop_1. 

(Loop 3) Select into List_3 all topics > %T2% with at least two 
     stories unrelated to %T1% or %T2%. 
     Iterate through List_3, choosing the third topic %T3%. 
     If topic %T1% still has at least two stories unrelated 
     to %T2% or %T3%, 
     and topic %T2% still has at least two stories unrelated 
     to %T1% or %T3%, PASS control into Loop_4. 
     CONTINUE in Loop_3. 
     If the end of List_3 is reached, go BACK to Loop_2. 

(Loop 4) Select into List_4 all topics > %T3% with at least two 
     stories unrelated to %T1%, %T2%, or %T3%. 
     Iterate through List_4, choosing the fourth topic %T4%. 
     If topic %T1% still has at least two stories unrelated 
     to %T2%, %T3%, or %T4%, 
     and topic %T2% still has at least two stories unrelated 
     to %T1%, %T3%, or %T4%, 
     and topic %T3% still has at least two stories unrelated 
     to %T1%, %T2%, or %T4%, PASS control into Loop_5. 
     CONTINUE in Loop_4. 
     If the end of List_4 is reached, go BACK to Loop_3. 

(Loop 5) Select into List_5 all topics > %T4% with at least two 
     stories unrelated to %T1%, %T2%, %T3%, or %T4%. 
     Iterate through List_5, choosing the fifh topic %T5%. 
     If topic %T1% still has at least two stories unrelated 
     to %T2%, %T3%, %T4%, or %T5%, 
     and topic %T2% still has at least two stories unrelated 
     to %T1%, %T3%, %T4%, or %T5%, 
     and topic %T3% still has at least two stories unrelated 
     to %T1%, %T2%, %T4%, or %T5%, 
     and topic %T4% still has at least two stories unrelated 
     to %T1%, %T2%, %T3%, or %T5%, EXIT with success 
     returning five topics %T1%, %T2%, %T3%, %T4%, and %T5%. 
     CONTINUE in Loop_5. 
     If the end of List_5 is reached, go BACK to Loop_4.

使用「選擇」在每個嵌套循環的開口是爲了喚起 SQL查詢的可能性，以實現許多邏輯的。對於例如最外層循環基本上是剛開結果集中此查詢：

SELECT TS1.TopicID, Count(*) 
From TopicStory TS1 
Group By TS1.TopicID 
Having Count(*) > 1

內部循環的相應的列表可以類似通過SQL查詢取決於在所選主題的參數值來構造外環。爲了說明沒有不必要的重複讓我們跳權內部循環，並給出List_5適當的查詢：

SELECT TS5.TopicID, Count(*) 
From TopicStory TS5 
Where TS5.TopicID > %T4% 
    and NOT EXISTS (SELECT * 
         From TopicStory TSX 
         Where TSX.TopicID in (%T1%,%T2%,%T3%,%T4%) 
         and TSX.StoryID = TS5.StoryID 
        ) 
Group By TS5.TopicID 
Having Count(*) > 1

這之後將在List_5檢查％T5％產生計數至少兩個故事的左對於主題％T1％：

SELECT Count(*) 
From TopicStory TZ1 
Where TZ1.TopicID = %T1% 
    and NOT EXISTS (SELECT * 
        From TopicStory TX1 
        Where TX1.StoryID = TZ1.StoryID 
         and TX1.TopicID in (%T2%,%T3%,%T4%,TS5.TopicID) 
       )

並且對於其他先前話題選擇進行了必要的修改。

雖然它可能會降低性能不可接受，限制相關％T5％主題額外的邏輯（使前面的話題選擇仍然保留至少兩層的選擇）可以推入一個查詢。它應該是這樣的：

/* 
    Given %T1%, %T2%, %T3$, and %T4% from queries above, find all topics %T5% > %T4% 
    with at least 2 stories not related to %T1%, %T2%, %T3%, or %T4% and such that 
    %T1% still has at least 2 stories not related to %T2%, %T3%, %T4%, or %T5% and 
    %T2% still has at least 2 stories not related to %T1%, %T3%, %T4%, or %T5% and 
    %T3% still has at least 2 stories not related to %T1%, %T2%, %T4%, or %T5% and 
    %T4% still has at least 2 stories not related to %T1%, %T2%, %T3%, or %T5% 
*/ 

SELECT TS5.TopicID, Count(*) 
From TopicStory TS5 
Where TS5.TopicID > %T4% 
    and NOT EXISTS (SELECT * 
         From TopicStory TSX 
         Where TSX.TopicID in (%T1%,%T2%,%T3%,%T4%) 
         and TSX.StoryID = TS5.StoryID 
        ) 
    and (SELECT Count(*) 
      From TopicStory TZ1 
      Where TZ1.TopicID = %T1% 
      and NOT EXISTS (SELECT * 
           From TopicStory TX1 
           Where TX1.StoryID = TZ1.StoryID 
           and TX1.TopicID in (%T2%,%T3%,%T4%,TS5.TopicID) 
          ) 
     ) > 1 
    and (SELECT Count(*) 
      From TopicStory TZ2 
      Where TZ2.TopicID = %T2% 
      and NOT EXISTS (SELECT * 
           From TopicStory TX2 
           Where TX2.StoryID = TZ2.StoryID 
           and TX2.TopicID in (%T1%,%T3%,%T4%,TS5.TopicID) 
          ) 
     ) > 1 
    and (SELECT Count(*) 
      From TopicStory TZ3 
      Where TZ3.TopicID = %T3% 
      and NOT EXISTS (SELECT * 
           From TopicStory TX3 
           Where TX3.StoryID = TZ3.StoryID 
           and TX3.TopicID in (%T1%,%T2%,%T4%,TS5.TopicID) 
          ) 
     ) > 1 
    and (SELECT Count(*) 
      From TopicStory TZ4 
      Where TZ4.TopicID = %T4% 
      and NOT EXISTS (SELECT * 
           From TopicStory TX3 
           Where TX3.StoryID = TZ3.StoryID 
           and TX3.TopicID in (%T1%,%T2%,%T3%,TS5.TopicID) 
          ) 
     ) > 1 
Group By TS5.TopicID 
Having Count(*) > 1

的功能集的MySQL是一個進展中的工作，所以可以想象在存儲過程中的高效實現是可能的，其中光標可以採取主題列出的作用。然而，如果「遊標」是外部管理列表（例如PHP），並且數據庫查詢保持儘可能簡單，我會對自己的良好業績充滿信心。

來源

2011-06-17 19:41:57 hardmath

一個具有挑戰性的算法問題

回答

相關問題