2013-04-06 115 views
1

我當時正在玩一個MPMC FIFO-Queue的雙向鏈表(主要用於演示目的)。我現在的目標是試圖在一段時間內糾正我的錯誤,而我並沒有真正取得進展。並行雙鏈表 - 多生產者/消費者FIFO隊列 - 死鎖

在所有Producer-Threads產生所有值後不久,我遇到了死鎖。我牽制的問題如下:

獲取鎖:

<THREAD ID> | <MESSAGE> | <MUTEX ADDRESS> 

4460175360 : TRYLOCK HEAD: 0x7fb380c039d0 
4460175360 : LOCKED HEAD:  0x7fb380c039d0 
4460175360 : RELEASED HEAD: 0x7fb380c039d0 

4460175360 : TRYLOCK HEAD: 0x7fb380c039d0 
4460175360 : LOCKED HEAD:  0x7fb380c039d0 
4460175360 : RELEASED HEAD: 0x7fb380c039d0 

4460175360 : TRYLOCK HEAD: 0x7fb380c039d0 
4460175360 : LOCKED HEAD:  0x7fb380c039d0 
4460175360 : RELEASED HEAD: 0x7fb380c039d0 

4460175360 : TRYLOCK HEAD: 0x7fb380c039d0 
4459638784 : TRYLOCK TAIL:  0x7fb380c039d0 
4460711936 : TRYLOCK TAIL:  0x7fb380c039d0 
4460175360 : LOCKED HEAD:  0x7fb380c039d0 
4460175360 : RELEASED HEAD: 0x7fb380c039d0 

4460175360 : TRYLOCK HEAD: 0x7fb380c039d0 <---- THIS TRY-LOCK-HEAD NEVER GETS CONFIRMED! 
4459638784 : LOCKED TAIL: 0x7fb380c039d0 
4459638784 : RELEASE MUTEX: 0x7fb380c039d0 <---- THE PRODUCER THREAD RELEASED 0x7fb380c039d0 
Producer-Thread 4459638784 produced: 0 

---- NOW ONE ELEMENT IN QUEUE ---- 

4460711936 : LOCKED TAIL: 0x7fb381800020 <---- ADDRESS OF LOCK HAS CHANGED WHICH IS FINE (because an element has been added) 
4460711936 : RELEASE MUTEX AT ADDR: 0x7fb381800020 
Producer-Thread 4460711936 produced: 0 

---- NOW TWO ELEMENTS IN QUEUE ---- 

4460711936 : TRYLOCK TAIL: 0x7fb381a00020 
4459638784 : TRYLOCK TAIL: 0x7fb381a00020 
4460711936 : LOCKED TAIL: 0x7fb381a00020 
4460711936 : RELEASE MUTEX AT ADDR: 0x7fb381a00020 
Producer-Thread 4460711936 produced: 1 

---- NOW THREE ELEMENTS IN QUEUE ---- 

4459638784 : LOCKED TAIL: 0x7fb380d00060 <---- AGAIN ADDRESS HAS CHANGED -- FINE 
4459638784 : RELEASE MUTEX AT ADDR: 0x7fb380d00060 
Producer-Thread 4459638784 produced: 1 

---- NOW FOUR ELEMENTS IN QUEUE ---- PRODUCERS ARE DONE ---- 
---- CONSUMER THREAD BLOCKS - STILL TRYING TO LOCK 0x7fb380c039d0 ---- 

---- BUT NOBODY ELSE HOLDS 0x7fb380c039d0 ---- ALSO NO SELF-DEADLOCK? 

GDB告訴我,兩個生產者線程已經終止,並等待從addr 0x7fb380c039d0的mutx唯一的線索是線程2(僅消費者線程):

(gdb) info threads 
* 2       0x00007fff8edc5dfd in pthread_mutex_lock() 
    1 "com.apple.main-thread" 0x00007fff8e715386 in __semwait_signal() 
(gdb) bt 
#0 0x00007fff8e715122 in __psynch_mutexwait() 
#1 0x00007fff8edc5dfd in pthread_mutex_lock() 
#2 0x00000001000011a8 in ConcurrentDoublyLinkedList<int>::consumeNode (this=0x1001000e0) at ConcurrentDList.h:142 
#3 0x0000000100000bd4 in consumeValues (ctx=0x7fff5fbffa78) at ConcurrentDList.cpp:29 
#4 0x00007fff8edc07a2 in _pthread_start() 
#5 0x00007fff8edad1e1 in thread_start() 
(gdb) f 2 
#2 0x00000001000011a8 in ConcurrentDoublyLinkedList<int>::consumeNode (this=0x1001000e0) at ConcurrentDList.h:142 
142    pthread_mutex_lock(&head_->mutex); 
(gdb) info reg 
rax   0x200012d 33554733 
rbx   0x100281000 4297592832 
rcx   0x100280da8 4297592232 
rdx   0x100 256 
rsi   0x403 1027 
rdi   0x1001039f0 4296030704 
rbp   0x100280ed0 0x100280ed0 
rsp   0x100280e00 0x100280e00 
r8    0x2060 8288 
r9    0x100280720 4297590560 
r10   0xf9d79 1023353 
r11   0x206 518 
r12   0x1303 4867 
r13   0x0 0 
r14   0x7fff5fbffa78 140734799805048 
r15   0x100000bb0 4294970288 
rip   0x1000011a8 0x1000011a8 <ConcurrentDoublyLinkedList<int>::consumeNode()+110> 
eflags   0x206 518 
cs    0x7 7 
ss    0x0 0 
ds    0x0 0 
es    0x0 0 
fs    0x0 0 
gs    0x0 0 
(gdb) p *(pthread_mutex_t*) 0x1001039f0 
$34 = { 
    __sig = 1297437784, 
    __opaque = "\000\000\000\000` \000\000\000\000\000\000\000\000\000\000\003\004\000\000\000\002\000\000{?\017\000\000\000\000\000\b:\020\000\001\000\000\000\f:\020\000\001\000\000\000\000\000\000\000\000\000\000" 
} 

可惜我不能昆蟲pthread_mutex_t因爲它是一個不透明的類型。 (雖然我在GDB中打開了不透明的類型?)。否則,很高興看到該互斥鎖的所有者目前是誰。

我的列表的代碼在下面給出 - 它被過度評論,解釋我對實現的想法。請注意,這遠遠不是一個好的或高性能的實現 - 它主要用於說明目的。

CPP-代碼:

#include <stdlib.h> 
#include <iostream> 

#include <pthread.h> 

template <class T> 
class Node 
{ 
    private: 
     T value_; 

     Node<T> *next_; 
     Node<T> *prev_; 

    public: 

     pthread_mutex_t mutex; 

     Node(T value, Node<T>* next, Node<T>* prev) 
     { 
      value_ = value; 

      next_ = next; 
      prev_ = prev; 

      pthread_mutex_init(&mutex, NULL); 
     } 

     ~Node() 
     { 
      pthread_mutex_destroy(&mutex); 
     } 

     void setNext(Node<T>* next) 
     { 
      next_ = next; 
     }   

     void setPrevious(Node<T>* prev) 
     { 
      prev_ = prev; 
     } 

     Node<T>* getNext() 
     { 
      return next_; 
     } 

     Node<T>* getPrevious() 
     { 
      return prev_; 
     } 


     virtual bool isSentinel() 
     { 
      return false; 
     } 
}; 

template <class T> 
class SentinelNode : public Node<T> 
{ 
    public: 
     SentinelNode(T val, Node<T>* next, Node<T>* prev) 
      : Node<T>(val, next, prev) 
     { 
     } 

     virtual bool isSentinel() 
     { 
      return true; 
     } 
}; 

template <class T> 
class ConcurrentDoublyLinkedList 
{ 
    private: 
     Node<T> *head_; 
     Node<T> *tail_; 

    public: 
     ConcurrentDoublyLinkedList() 
     { 
      head_ = tail_ = new SentinelNode<T>(NULL, NULL, NULL); 
     } 

     void produceNode(Node<T>* n) 
     { 
      pthread_mutex_lock(&tail_->mutex); 

      Node<T>* old = tail_; 

      if(head_->isSentinel()) 
      { 
       head_ = tail_ = n; 
       pthread_mutex_unlock(&old->mutex); 

       delete old; 
       old = NULL; 
      } 
      else 
      { 
       n->setNext(tail_); 
       tail_->setPrevious(n); 
       tail_ = n; 

       pthread_mutex_unlock(&old->mutex); 
      } 
     } 

     Node<T>* consumeNode() 
     { 
      pthread_mutex_lock(&head_->mutex); 

      if(head_->isSentinel()) 
      { 
       /* the head can only be a Sentinel if there is 
        * no element within the list 
        */ 

        /* this also means that the list is currently 
        * implicitly fully locked */ 

        /* return NULL and let the thread decide what to do */ 

        pthread_mutex_unlock(&head_->mutex); 

        return NULL; 
      } 
      else 
      { 
       /* the head is not a Sentinel, which implies on of the following: 
       *  - The list has exactly one element, which means: 
       *   => head_ == tail_ && !head_->isSentinel() 
       *  
       *  OR 
       *  
       *  - The list has at least two elements, which means: 
       *   => head_ != tail_ && !head_->isSentinel() 
       * 
       *  - The absence of a Sentinel guarantees that the list 
       *  is NOT empty 
       */ 

       if(head_ == tail_) 
       { 
         /* single element within the list 
         * the list is still fully locked, 
         * implicit because head_ == tail_ 
         */ 

         /* we replace the only element 
         * with a Sentinel, because 
         * the list is empty afterwards 
         */ 

         Node<T>* p = head_; 
         head_ = tail_ = new SentinelNode<T>(NULL, NULL, NULL); 

         pthread_mutex_unlock(&p->mutex); 

         return p; 
       } 
       else 
       { 
         /* at least two elements are in the list 
         * which means that the current head 
         * must have a valid predecessor 
         */ 

         /* Producer and Consumer could 
         * hold a lock on two adjacent 
         * nodes. Thus, the Consumer 
         * must acquire head->prev 
         */ 

         Node<T>* p = head_; 
         Node<T>* n = head_->getPrevious(); 

         pthread_mutex_lock(&n->mutex); 

         /* head_->prev can now not be owned 
         * by a producer. 
         */ 

         head_ = n; 
         head_->setNext(NULL); 

         pthread_mutex_unlock(&p->mutex); 
         pthread_mutex_unlock(&n->mutex); 

         return p; 
       } 
      } 
     } 
}; 

我會很感激上的東西你的想法和意見。我現在看了一會兒,也可能完全走錯了路。幫助理解..

謝謝, 塞巴斯蒂安

+0

對於初學者來說,失去了哨兵節點的概念。他們是非常罕見的,這也不是例外。 NULL與其他任何東西一樣好,並且顯着降低了列表管理的複雜性。在這種情況下,只要將新節點添加到列表中,就會發生泄漏。即一旦清單從空到便籤,該哨兵就會泄露。 – WhozCraig 2013-04-06 02:23:54

+0

好的 - 但我們也可以決定正確釋放它(代碼被編輯)!使用Sentinel節點可以很容易地鎖定空列表,所以我沒有看到列表管理複雜性的「顯着」降低... – 2013-04-06 10:51:57

+0

我想你是對的。只要這些錯誤是固定的,它並不是很重要。而「我們」並沒有寫出這個漏洞,所以它在這方面甚至不那麼重要。 – WhozCraig 2013-04-06 15:30:54

回答

0

的消耗和產生兩個鎖定假定節點是靜止的頭部和尾部之後的頭部和尾部的節點。也就是說,線程可能鎖定了不再是頭節點或尾節點的節點。如果在鎖定之前存儲「舊」指針,則至少知道您鎖定了哪個節點,但鎖定的節點可能已經被佔用,並且您的某些邏輯可能不再成立。

我認爲鎖定頭部和尾部指針而不是實際節點會更容易,假設您只需要生產和使用方法。當然,最簡單的解決方案是鎖定整個數據結構。

+0

你有這種情況的例子嗎?我看不到這種不變性應該被侵犯。如果頭被鎖定,則不能被任何其他線程使用,並且被替換爲標記,或將前任設置爲新頭(任何生產者或任何其他使用者都不能更改 - 因爲已鎖定)。如果一個生產者鎖定尾部,並且它不是一個哨兵,那麼只有當尾部至少是列表中的第三個元素時,它才能改變它。你會如何以鎖定形式表達你的疑慮? – 2013-04-06 13:33:48

+0

我有這樣的感覺,問題確實在於你提到的情況......但我仍然看不到我認爲不變的不變量...你能提供一個例子嗎? – 2013-04-06 20:56:44

相關問題