這段代碼爲什麼會死鎖？

我在我的可加載模塊中創建了2個Linux內核線程，並將它們綁定到在雙核Android設備上運行的獨立CPU內核。運行這幾次後，我注意到設備重新啓動，並且硬件看門狗定時器復位。我始終如一地解決問題。什麼可能導致僵局？這段代碼爲什麼會死鎖？

基本上，我需要做的是確保兩個線程同時在不同的內核上運行do_something（），而不會有人盜取cpu週期（即中斷被禁用）。我正在使用一個自旋鎖和一個易變的變量。我也有一個父線程在子線程上等待的信號量。

#define CPU_COUNT 2 

/* Globals */ 
spinlock_t lock; 
struct semaphore sem; 
volatile unsigned long count; 

/* Thread util function for binding the thread to CPU*/ 
struct task_struct* thread_init(kthread_fn fn, void* data, int cpu) 
{ 
    struct task_struct *ts; 

    ts=kthread_create(fn, data, "per_cpu_thread"); 
    kthread_bind(ts, cpu); 
    if (!IS_ERR(ts)) { 
     wake_up_process(ts); 
    } 
    else { 
     ERR("Failed to bind thread to CPU %d\n", cpu); 
    } 
    return ts; 
} 

/* Sync both threads */ 
void thread_sync() 
{ 
    spin_lock(&lock); 
    ++count; 
    spin_unlock(&lock); 

    while (count != CPU_COUNT); 
} 

void do_something() 
{ 
} 

/* Child thread */ 
int per_cpu_thread_fn(void* data) 
{ 
    int i = 0; 
    unsigned long flags = 0; 
    int cpu = smp_processor_id(); 

    DBG("per_cpu_thread entering (cpu:%d)...\n", cpu); 

    /* Disable local interrupts */ 
    local_irq_save(flags); 

    /* sync threads */ 
    thread_sync(); 

    /* Do something */ 
    do_something(); 

    /* Enable interrupts */ 
    local_irq_restore(flags); 

    /* Notify parent about exit */ 
    up(&sem); 
    DBG("per_cpu_thread exiting (cpu:%d)...\n", cpu); 
    return value; 
} 

/* Main thread */ 
int main_thread() 
{ 
    int cpuB; 
    int cpu = smp_processor_id(); 
    unsigned long flags = 0; 

    DBG("main thread running (cpu:%d)...\n", cpu); 

    /* Init globals*/ 
    sema_init(&sem, 0); 
    spin_lock_init(&lock); 
    count = 0; 

    /* Launch child thread and bind to the other CPU core */ 
    if (cpu == 0) cpuB = 1; else cpuB = 0;   
    thread_init(per_cpu_thread_fn, NULL, cpuB); 

    /* Disable local interrupts */ 
    local_irq_save(flags); 

    /* thread sync */ 
    thread_sync(); 

    /* Do something here */ 
    do_something(); 

    /* Enable interrupts */ 
    local_irq_restore(flags); 

    /* Wait for child to join */ 
    DBG("main thread waiting for all child threads to finish ...\n"); 
    down_interruptible(&sem); 
}

來源

2013-08-02 Gupta

我不確定，這是一個真正的原因，但是您的代碼包含一些嚴重錯誤。

第一個在while (count != CPU_COUNT);。除非讀取是原子的，否則不能在不鎖定鎖的情況下讀取共享變量。與count它不保證是。

您必須保護帶鎖的讀取count。你可以用下面的代碼替換您while循環：

unsigned long local_count; 
do { 
    spin_lock(&lock); 
    local_count = count; 
    spin_unlock(&lock); 
} while (local_count != CPU_COUNT);

或者，你可以使用原子類型。通知不存在鎖定

atomic_t count = ATOMIC_INIT(0); 

... 

void thread_sync() { 
    atomic_inc(&count); 
    while (atomic_read(&count) != CPU_COUNT); 
}

二問題中斷。我想，你不明白你在做什麼。

local_irq_save()保存並禁用中斷。然後，您再次使用local_irq_disable()禁用中斷。經過一些工作後，您可以使用local_irq_restore()恢復以前的狀態，並使用啓用中斷。這種使能是完全錯誤的。無論以前的狀態如何，您都可以啓用中斷。

第三問題。如果主線程沒有綁定到CPU，除非您確定在獲取CPU編號後內核不會重新計劃，否則不應使用smp_processor_id()。最好使用get_cpu()，它禁用內核搶佔，然後返回cpu id。完成後，請致電put_cpu()。

但是，當您撥打get_cpu()時，這是創建和運行其他線程的錯誤。這就是爲什麼你應該設置主線程的親和力。

第四。 local_irq_save()和local_irq_restore()需要變量的宏，而不是指向unsigned long的指針。（我有一個錯誤和一些警告傳遞指針，我不知道你是如何編譯你的代碼的）。您的回覆http://pastebin.com/Ven6wqWf

來源

2013-08-03 01:07:01

感謝Rasen：刪除引用

最後的代碼可以在這裏找到。我修正了中斷呼叫，但仍然看到問題。我不能在spin_lock（）內移動（count！= CPU_COUNT），因爲它會立即死鎖。你有其他建議嗎？我的要求是兩個線程都應該在同一時間開始執行do_something（）。 – Gupta

您必須保護讀取'count'鎖。我編輯了我的帖子以顯示如何執行此操作。 –

再次感謝指針。這還沒有解決我的問題。我看到其中一個線程正在旋轉，而另一個線程永遠不會增加計數器。您是否看到線程創建方式的問題？ – Gupta

這段代碼爲什麼會死鎖？

回答

相關問題