C++ Intel TBB和Microsoft PPL，如何在並行循環中使用next_permutation？

我有安裝了英特爾並行工作室2013的Visual Studio 2012，所以我有英特爾TBB。C++ Intel TBB和Microsoft PPL，如何在並行循環中使用next_permutation？

說我有下面的代碼：

const int cardsCount = 12; // will be READ by all threads 
// the required number of cards of each colour to complete its set: 
// NOTE that the required number of cards of each colour is not the same as the total number of cards of this colour available 
int required[] = {2,3,4}; // will be READ by all threads 
Card cards[cardsCount]; // will be READ by all threads 
int cardsIndices[cardsCount];// this will be permuted, permutations need to be split among threads ! 

// set "cards" to 4 cards of each colour (3 colours total = 12 cards) 
// set cardsIndices to {0,1,2,3...,11} 

// this variable will be written to by all threads, maybe have one for each thread and combine them later?? or can I use concurrent_vector<int> instead !? 
int logColours[] = {0,0,0}; 

int permutationsCount = fact(cardsCount); 

for (int pNum=0; pNum<permutationsCount; pNum++) // I want to make this loop parallel !! 
{ 
    int countColours[3] = {0,0,0}; // local loop variable, no problem with multithreading 
    for (int i=0; i<cardsCount; i++) 
    { 
     Card c = cards[cardsIndices[i]]; // accessed "cards" 

     countColours[c.Colour]++; // local loop variable, np. 
      // we got the required number of cards of this colour to complete it 
     if (countColours[c.Colour] == required[c.Colour]) // read global variable "required" ! 
     { 
        // log that we completed this colour and go to next permutation 
      logColours[c.Colour] ++; // should I use a concurrent_vector<int> for this shared variable? 
      break; 
     } 
    } 
    std::next_permutation(cardsIndices, cardsIndices+cardsCount); // !! this is my main issue 
}

什麼我計算多少次，我們將完成一個顏色，如果我們從現有的卡隨機挑選，而這通過每個去詳盡地做可能的排列和順序選擇，當顏色「完成」時，我們打破並進入下一個排列。請注意，我們有4種不同顏色的卡片，但紅色，綠色和藍色所需的卡片數量爲{2,3,4}。 2張紅牌足以完成紅色並且我們有4張可用，因此紅色比藍色更有可能完成，這需要全部4張牌被選中。

我想使這個for-loop並行，但我的主要問題是如何處理「卡」排列？如果我有4個線程，我可以把它分成4個不同的區域，並讓每個線程都通過它們。

如果我不知道機器的內核數量，並且我希望程序自動選擇正確的併發線程數，該怎麼辦？肯定有一種方法可以使用英特爾或微軟工具來做到這一點？

這是我的名片結構以防萬一：

struct Card 
{ 
public: 
    int Colour; 
    int Symbol; 
}

來源

2013-04-27 Space monkey

你可以很容易通過固定排列的第一個元素，並呼籲其他元素獨立於每個線程std::next_permutation使你的代碼運行在平行1,2, ..., or cardsCount線程。考慮下面的代碼：

// declarations 

// #pragma omp parallel may be here 
{ // start of a parallel section 
    const int start = (cardsCount * threadIndex)/threadNumber; 
    const int end = (cardsCount * (threadIndex + 1))/threadNumber; 

    int cardsIndices[cardsCount]; // a local array for each thread 

    for (const int firstElement = start; firstElement < end; ++firstElement) { 
     cardsIndices[0] = firstElement; 
     // fill other cardsIndices with elements [0-cardsCount], but skipping firstElement 
     do { 
      // your calculations go here 
     } while (std::next_permutation(cardsIndices + 1, cardsIndices + cardsCount)); // note the +1 here 
    } 
}

如果您希望使用的OpenMP作爲並行化工具，你只需要只是並行部分前加#pragma omp parallel。並使用 omp_get_thread_num()函數獲取線程索引。
你也不必在這裏使用一個concurrent_vector，這將可能使你的程序非常緩慢，使用特定線程積累數組：
```
logColours[threadNumber][3] = {}; 
++logColours[threadIndex][c.Colour]; 
```
如果Card是一個相當沉重的類，我會建議使用const Card& c = ...而不是每次複製Card c = ...。

來源

2013-04-27 10:46:29 Ixanezis

您可以使用std::thread::hardware_ concurrency()從<thread>。從「C++併發在行動」引用的A.Williams -

一個C++標準庫的功能，可以幫助這裏是 std::thread::hardware_ concurrency()。對於給定的程序執行，此函數返回指示可真正並行運行的線程數。例如，在多核系統上，CPU核的數量可能是。

來源

2013-04-27 10:47:58 SChepurin

讓N = cardsNumber，M = required[0] * required[1] * ... * required[maxColor]。然後，實際上，您的問題可以在O（N * M）時間內輕鬆解決。在你的情況下，這是12 * 2 * 3 * 4 = 288操作。 :)

可能的方法之一是使用循環關係。考慮一個函數logColours f(n, required)。假設n是已經考慮的卡的當前數目; required是你的例子中的一個向量。函數返回向量logColours中的答案。你有興趣f(12, {2,3,4})。功能f簡裏面反覆的計算可以這樣寫：

std::vector<int> f(int n, std::vector<int> require) { 
    if (cache[n].count(require)) { 
     // we have already calculated function with same arguments, do not recalculate it again 
     return cache[n][require]; 
    } 

    std::vector<int> logColours(maxColor, 0); // maxColor = 3 in your example 

    for (int putColor=0; putColor<maxColor; ++putColor) { 
     if (/* there is still at least one card with color 'putColor'*/) { 
       // put a card of color 'putColor' on place 'n' 
       if (require[putColor] == 1) { 
        // means we've reached needed amount of cards of color 'putColor' 
        ++logColours[putColor]; 
       } else { 
        --require[putColor]; 
        std::vector<int> logColoursRec = f(n+1, require); 
        ++require[putColor]; 
        // merge child array into your own. 
        for (int i=0; i<maxColor; ++i) 
         logColours[i] += logColoursRec[i]; 
       } 
      } 
    } 

    // store logColours in a cache corresponding to this function arguments 
    cache[n][required] = std::move(logColours); 
    return cache[n][required]; 
}

緩存可以作爲std::unordered_map<int, std::unordered_map<std::vector<int>, std::vector<int>>>來實現。

一旦你理解了主要思想，你就可以用更高效的代碼實現它。

來源

2013-04-27 11:54:14 Ixanezis

嗯..我不明白這個主意，但我可以看到你不關心我擁有的卡片。如果你不知道這些卡片，這個如何工作？在我的例子中，我有4種顏色的卡片，總共3種顏色，所以有12張卡片。我編輯了我的問題，以表明.. – 2013-04-27 13:22:38

我很在乎。在'f'函數中有一個'if'，它檢查剩餘所需顏色的卡片數量。我以爲你可以自己填寫代碼。 – Ixanezis 2013-04-27 17:33:12

請看看完整的實現：http://ideone.com/9ibXaa – Ixanezis 2013-04-27 18:29:57

我猜這是什麼意思@Ixanezis

如果紅色勝

的最終結果將是一個業餘友好版本：2紅，綠0-2，0-3藍色

說中獎紅色爲A，其他紅色B，有12種方法讓A和B.

以下是可能的情況：

Cases:   #Cards after A #Cards before A #pick green #pick blue 
0 green, 0 blue: 10! = 3628800  1! = 1   1   1 
0 green, 1 blue: 9 ! = 362880  2! = 2   1   4 
0 green, 2 blue: 8 ! = 40320  3! = 6   1   6 
0 green, 3 blue: 7 ! = 5040  4! = 24   1   4 
1 green, 0 blue: 9 ! = 362880  2! = 2   4   1 
1 green, 1 blue: 8 ! = 40320  3! = 6   4   4 
1 green, 2 blue: 7 ! = 5040  4! = 24   4   6 
1 green, 3 blue: 6 ! = 720   5! = 120  4   4 
2 green, 0 blue: 8 ! = 40320  3! = 6   6   1 
2 green, 1 blue: 7 ! = 5040  4! = 24   6   4 
2 green, 2 blue: 6 ! = 720   5! = 120  6   6 
2 green, 3 blue: 5 ! = 120   6! = 720  6   4

允許sumproduct那些4個數組：= 29064960，然後乘以12 = 348779520

類似地，可以計算值綠色勝用於藍色勝。

來源

2013-04-28 16:07:13 colinfang

C++ Intel TBB和Microsoft PPL，如何在並行循環中使用next_permutation？

回答

相關問題