malloc的分段故障

這裏是一段代碼，其中段故障發生時（在PERROR不被調用）：malloc的分段故障

job = malloc(sizeof(task_t)); 
if(job == NULL) 
    perror("malloc");

爲了更精確，GDB說，segfault發生一個__int_malloc呼叫，在其內部是由malloc進行的子例程調用。

由於malloc函數與其他線程並行調用，最初我認爲它可能是問題。我使用glibc的版本2.19。

的數據結構：

typedef struct rv_thread thread_wrapper_t; 

typedef struct future 
{ 
    pthread_cond_t wait; 
    pthread_mutex_t mutex; 
    long completed; 
} future_t; 

typedef struct task 
{ 
    future_t * f; 
    void * data; 
    void * 
    (*fun)(thread_wrapper_t *, void *); 
} task_t; 

typedef struct 
{ 
    queue_t * queue; 
} pool_worker_t; 

typedef struct 
{ 
    task_t * t; 
} sfuture_t; 

struct rv_thread 
{ 
    pool_worker_t * pool; 
};

現在在今後實現：

future_t * 
create_future() 
{ 
    future_t * new_f = malloc(sizeof(future_t)); 
    if(new_f == NULL) 
    perror("malloc"); 
    new_f->completed = 0; 
    pthread_mutex_init(&(new_f->mutex), NULL); 
    pthread_cond_init(&(new_f->wait), NULL); 
    return new_f; 
} 

int 
wait_future(future_t * f) 
{ 
    pthread_mutex_lock(&(f->mutex)); 
    while (!f->completed) 
    { 
     pthread_cond_wait(&(f->wait),&(f->mutex)); 
    } 
    pthread_mutex_unlock(&(f->mutex)); 
    return 0; 
} 

void 
complete(future_t * f) 
{ 
    pthread_mutex_lock(&(f->mutex)); 
    f->completed = 1; 
    pthread_mutex_unlock(&(f->mutex)); 
    pthread_cond_broadcast(&(f->wait)); 
}

線程池本身：

pool_worker_t * 
create_work_pool(int threads) 
{ 
    pool_worker_t * new_p = malloc(sizeof(pool_worker_t)); 
    if(new_p == NULL) 
    perror("malloc"); 
    threads = 1; 
    new_p->queue = create_queue(); 
    int i; 
    for (i = 0; i < threads; i++){ 
    thread_wrapper_t * w = malloc(sizeof(thread_wrapper_t)); 
    if(w == NULL) 
     perror("malloc"); 
    w->pool = new_p; 
    pthread_t n; 
    pthread_create(&n, NULL, work, w); 
    } 
    return new_p; 
} 

task_t * 
try_get_new_task(thread_wrapper_t * thr) 
{ 
    task_t * t = NULL; 
    try_dequeue(thr->pool->queue, t); 
    return t; 
} 

void 
submit_job(pool_worker_t * p, task_t * t) 
{ 
    enqueue(p->queue, t); 
} 

void * 
work(void * data) 
{ 
    thread_wrapper_t * thr = (thread_wrapper_t *) data; 
    while (1){ 
    task_t * t = NULL; 
    while ((t = (task_t *) try_get_new_task(thr)) == NULL); 
    future_t * f = t->f; 
    (*(t->fun))(thr,t->data); 
    complete(f); 
    } 
    pthread_exit(NULL); 
}

最後的task.c：

pool_worker_t * 
create_tpool() 
{ 
    return (create_work_pool(8)); 
} 

sfuture_t * 
async(pool_worker_t * p, thread_wrapper_t * thr, void * 
(*fun)(thread_wrapper_t *, void *), void * data) 
{ 
    task_t * job = NULL; 
    job = malloc(sizeof(task_t)); 
    if(job == NULL) 
    perror("malloc"); 
    job->data = data; 
    job->fun = fun; 
    job->f = create_future(); 
    submit_job(p, job); 
    sfuture_t * new_t = malloc(sizeof(sfuture_t)); 
    if(new_t == NULL) 
    perror("malloc"); 
    new_t->t = job; 
    return (new_t); 
} 

void 
mywait(thread_wrapper_t * thr, sfuture_t * sf) 
{ 
    if (sf == NULL) 
    return; 
    if (thr != NULL) 
    { 
     while (!sf->t->f->completed) 
     { 
      task_t * t_n = try_get_new_task(thr); 
      if (t_n != NULL) 
      { 
      future_t * f = t_n->f; 
      (*(t_n->fun))(thr,t_n->data); 
      complete(f); 
      } 
     } 
     return; 
    } 
    wait_future(sf->t->f); 
    return ; 
}

該隊列是lfds無鎖隊列。

#define enqueue(q,t) {         \ 
    if(!lfds611_queue_enqueue(q->lq, t))    \ 
     {            \ 
     lfds611_queue_guaranteed_enqueue(q->lq, t); \ 
     }            \ 
    } 

#define try_dequeue(q,t) {       \ 
    lfds611_queue_dequeue(q->lq, &t);    \ 
    }

無論何時調用異步的次數非常高，都會出現問題。

Valgrind的輸出：

Process terminating with default action of signal 11 (SIGSEGV) 
==12022== Bad permissions for mapped region at address 0x5AF9FF8 
==12022== at 0x4C28737: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)

來源

2014-02-26 guilhermemtr

是否有其他可能會使'malloc'的記賬混亂？ – cnicutar

這聽起來像內存在其他地方被損壞。 – imreal

這是唯一的解釋，我會發布整個代碼。（這真的是一個最小的模型，有內存泄漏等）。 – guilhermemtr

我已經想通了什麼問題：堆棧溢出。首先，讓我解釋爲什麼堆棧溢出發生在malloc內部（這可能是你閱讀本文的原因）。當我的程序運行時，每次開始執行（遞歸）另一個任務時（由於我編寫程序的方式），堆棧大小不斷增加。但是對於每個這樣的時間，我不得不使用malloc分配一個新任務。但是，malloc會進行其他子例程調用，這使得堆棧的大小比執行另一個任務的簡單調用更大。所以，發生的事情是，即使沒有malloc，我會得到一個堆棧溢出。然而，因爲我有malloc，堆棧溢出的時刻在malloc中，在它通過進行另一個遞歸調用溢出之前。插圖波紋管顯示發生了什麼事：

初始堆棧狀態：

------------------------- 
| recursive call n - 3 | 
------------------------- 
| recursive call n - 2 | 
------------------------- 
| recursive call n - 1 | 
------------------------- 
|  malloc   | 
------------------------- 
|  __int_malloc  | <- If the stack passes this point, the stack overflows. 
-------------------------

然後疊再萎縮，和我的代碼進入了一個新的遞歸調用：

------------------------- 
| recursive call n - 3 | 
------------------------- 
| recursive call n - 2 | 
------------------------- 
| recursive call n - 1 | 
------------------------- 
|  garbage  | 
------------------------- 
|  garbage  | <- If the stack passes this point, the stack overflows. 
-------------------------

malloc調用期間堆棧：

------------------------- 
| recursive call n - 3 | 
------------------------- 
| recursive call n - 2 | 
------------------------- 
| recursive call n - 1 | 
------------------------- 
| recursive call n  | 
------------------------- 
|  garbage  | <- If the stack passes this point, the stack overflows. 
-------------------------

然後，它再次在t中調用malloc他的新遞歸電話。然而，這一次它溢出：

------------------------- 
| recursive call n - 3 | 
------------------------- 
| recursive call n - 2 | 
------------------------- 
| recursive call n - 1 | 
------------------------- 
| recursive call n  | 
------------------------- 
|  malloc   | <- If the stack passes this point, the stack overflows. 
------------------------- 
|  __int_malloc  | <- This is when the stack overflow occurs. 
-------------------------

[答案的其餘部分更集中於爲什麼我在我的特別代碼這個問題]

通常，遞歸計算斐波那契時，例如，在某個數n的情況下，堆棧大小隨着該數字線性增長。但是，在這種情況下，我正在創建任務，使用隊列來存儲它們，並將一個（fib）任務移出執行。如果你在紙上畫這個，你會發現任務的數量隨着n而呈指數增長，而不是線性增加（還要注意，如果我使用堆棧來存儲創建任務時的任務數，以及棧的大小隻會隨着n的增長而線性增長，所以會發生堆棧隨着n成指數增長，導致堆棧溢出......現在是爲什麼這個溢出發生在調用malloc內部的部分，所以基本上我在上面解釋過，堆棧溢出發生在malloc調用中，因爲它是堆棧最大的地方，發生的事情是堆棧幾乎爆炸，並且由於malloc調用它的內部函數，堆棧的增長不僅僅是調用mywait， fib。

謝謝大家！如果這不是你的幫助，我將無法想象它！

來源

2014-02-27 11:43:53 guilhermemtr

我應該將自己的答案標記爲正確嗎？ – guilhermemtr

這就是我猜測的，因爲我找不到任何問題。但爲了確保這是問題，您可以將文件的「頂部」輸出轉儲並檢查內存使用情況如何增加？答案和問題+1。 – Jekyll

當我刪除所有線程時，valgrind說這可能是堆棧溢出，儘管這不太可能。我將ulimit設置得更大，然後我可以運行更大的fib數量。當我複製堆棧大小時，我只能將1添加到前一個數字。但我會照你說的去做，只是爲了證實 – guilhermemtr

甲SIGSEGV（分段故障）中的malloc在燒成通常是由堆損壞引起的。堆損壞不會導致分段錯誤，所以只有當malloc嘗試訪問時纔會看到該錯誤。問題是，創建堆損壞的代碼可能在距離調用malloc的任何地方都很遠。它通常是malloc中的下一個塊指針，它由堆損壞更改爲無效地址，因此，當您調用malloc時，無效指針會被解除引用並出現段錯誤。

我想你可能會嘗試從程序的其餘部分中分離出部分代碼，以減少錯誤的可見性。

此外，我看到你永遠不會釋放這裏的內存，並且可能有內存泄漏。

爲了檢查內存泄露，你可以運行top命令top -b -n 1檢查：

RPRVT - resident private address space size 
RSHRD - resident shared address space size 
RSIZE - resident memory size 
VPRVT - private address space size 
VSIZE - total memory size

來源

2014-02-26 20:48:48 Jekyll

問題是分段錯誤只發生在很多調用之後。 – guilhermemtr

你有沒有看到是否有內存泄漏？我在這裏沒有看到任何空閒的......你有空嗎？ – Jekyll

我會遇到一個問題，如果我不遲早釋放內存......因爲這個程序只在這裏分配... – Jekyll

malloc的分段故障

回答

相關問題