2013-03-06 85 views
1

我正嘗試在使用openMP的單獨CPU上運行兩個進程。在這種情況下,每個CPU都有6個帶超線程的核心(所以有12個硬件線程)。他們需要做一些同步,如果他們知道彼此的PID,看起來更容易一些。所以我正在從sigS開始一個sigC的過程,使用fork()execve()GOMP_CPU_AFFINITY環境變量調用了一個不同的值。在fork()/execve()電話後,sigS具有正確的親和力,但仍然打印sigC無法在分叉進程中設置OpenMP線程關聯

libgomp: no cpus left for affinity setting 

和所有線程都在相同的核心。

sigS代碼:

#define _GNU_SOURCE 
#include <stdio.h> 
#include <unistd.h> 
#include <errno.h> 
#include <omp.h> 
#include <sched.h> 

int main(void) 
{ 
    omp_set_num_threads(12); //12 hardware threads per CPU 
    //this loop runs as expected 
    #pragma omp parallel for 
    for(int i = 0; i<12; i++) { 
     #pragma omp critical 
     { 
     printf("TEST PRE-FORK: I am thread %2d running on core %d\n", 
       omp_get_thread_num(), sched_getcpu()); 
     } 
    } 

    pid_t childpid = fork(); 

    if(childpid < 0) { 
     perror("Fork failed"); 
    } else { 
     if(childpid == 0) { //<------ attempt to set affinity for child 
     //change the affinity for the other process so it runs 
     //on the other cpu 
     char ompEnv[] = "GOMP_CPU_AFFINITY=6-11 18-23"; 
     char * const args[] = { "./sigC", (char*)0 }; 
     char * const envArgs[] = { ompEnv, (char*)0 }; 
     execve(args[0], args, envArgs); 
     perror("Returned from execve"); 
     exit(1); 
     } else { 
     omp_set_num_threads(12); 
     printf("PARENT: my pid  = %d\n", getpid()); 
     printf("PARENT: child pid = %d\n", childpid); 
     sleep(5); //sleep for a bit so child process prints first 

     //This loop gives the same thread core/pairings as above 
     //this is expected 
     #pragma omp parallel for 
     for(int i = 0; i < 12; i++) { 
      #pragma omp critical 
      { 
       printf("PARENT: I'm thread %2d, on core %d.\n", 
         omp_get_thread_num(), sched_getcpu()); 
      } 
     } 
     } 
    } 
    return 0; 
} 

sigC的代碼只是有一個OMP平行for循環中,但爲了完整性:

#define _GNU_SOURCE 
#include <stdio.h> 
#include <unistd.h> 
#include <errno.h> 
#include <omp.h> 
#include <sched.h> 

int main(void) 
{ 
    omp_set_num_threads(12); 
    printf("CHILD: my pid  = %d\n", getpid()); 
    printf("CHILD: parent pid = %d\n", getppid()); 
    //I expect this loop to have the core pairings as I specified in execve 
    //i.e thread 0 -> core 6, 1 -> 7, ... 6 -> 18, 7 -> 19 ... 11 -> 23 
    #pragma omp parallel for 
    for(int i = 0; i < 12; i++) { 
     #pragma omp critical 
     { 
     printf("CHILD: I'm thread %2d, on core %d.\n", 
       omp_get_thread_num(), sched_getcpu()); 
     } 
    } 
    return 0; 
} 

輸出:

$ env GOMP_CPU_AFFINITY="0-5 12-17" ./sigS 

這部分如預期

TEST PRE-FORK: I'm thread 0, on core 0. 
TEST PRE-FORK: I'm thread 11, on core 17. 
TEST PRE-FORK: I'm thread 5, on core 5. 
TEST PRE-FORK: I'm thread 6, on core 12. 
TEST PRE-FORK: I'm thread 3, on core 3. 
TEST PRE-FORK: I'm thread 1, on core 1. 
TEST PRE-FORK: I'm thread 8, on core 14. 
TEST PRE-FORK: I'm thread 10, on core 16. 
TEST PRE-FORK: I'm thread 7, on core 13. 
TEST PRE-FORK: I'm thread 2, on core 2. 
TEST PRE-FORK: I'm thread 4, on core 4. 
TEST PRE-FORK: I'm thread 9, on core 15. 
PARENT: my pid  = 11009 
PARENT: child pid = 11021 

這就是問題 - 在孩子的所有線程核心0

libgomp: no CPUs left for affinity setting 
CHILD: my pid  = 11021 
CHILD: parent pid = 11009 
CHILD: I'm thread 1, on core 0. 
CHILD: I'm thread 0, on core 0. 
CHILD: I'm thread 4, on core 0. 
CHILD: I'm thread 5, on core 0. 
CHILD: I'm thread 6, on core 0. 
CHILD: I'm thread 7, on core 0. 
CHILD: I'm thread 8, on core 0. 
CHILD: I'm thread 9, on core 0. 
CHILD: I'm thread 10, on core 0. 
CHILD: I'm thread 11, on core 0. 
CHILD: I'm thread 3, on core 0. 

運行(我省略了父線程印刷,因爲它是一樣的預分叉)

任何關於如何解決這個問題或者如果這是正確的方法的想法?

+0

請經常檢查,如果你的代碼示例編譯在這裏張貼他們的編譯錯誤可能阻撓那些誰嘗試之前解決你的問題。在這兩個源代碼中,'errno.h'中都有一個逗號而不是點,而'sigS'的源代碼中有一個拼寫錯誤('/'而不是'//')。 – 2013-03-06 21:27:48

回答

3

fork() -ed子進程繼承其父親關聯掩碼。 libgomp將該親和度掩碼與來自GOMP_CPU_AFFINITY的集合相交,並且以兩個集合爲互補的空集合結束。這種行爲沒有記錄,但看看libgomp的源代碼證實確實如此。

的解決方案是重置子進程的關聯掩碼它使execve()呼叫前:

if (childpid == 0) { //<------ attempt to set affinity for child 
    cpu_set_t *mask; 
    size_t size; 
    int nrcpus = 256; // 256 CPUs should be more than enough 

    // Reset the CPU affinity mask 
    mask = CPU_ALLOC(nrcpus); 
    size = CPU_ALLOC_SIZE(nrcpus); 
    for (int i = 0; i < nrcpus; i++) 
     CPU_SET_S(i, size, mask); 
    if (sched_setaffinity(0, size, mask) == -1) { handle error } 
    CPU_FREE(mask); 

    //change the affinity for the other process so it runs 
    //on the other cpu 
    char ompEnv[] ="GOMP_CPU_AFFINITY=6-11 18-23"; 
    char * const args[] = {"./sigC", (char*)0}; 
    char * const envArgs[] = {ompEnv, (char*)0}; 
    execve(args[0], args, envArgs); 
    perror("Returned from execve"); 
    exit(1); 
} else {