2015-04-23 57 views
-1

那麼我必須在C中使用mandelbrot程序。我想我已經做得很好,我不能獲得更好的時間。我的問題,如果有人有一個想法來改善代碼,我一直在想也許在外部和內部之間的嵌套平行區域...openmp中的Mandelbrot優化

另外,我懷疑它是否更優雅或建議把所有的編譯指示單行或編寫單獨的編譯指示(一個用於omp並行和共享以及私有變量和一個條件,另一個用於omp和時間表動態的編譯指示)。

我懷疑常量是否可以用作私有變量,因爲我認爲使用常量而不是定義變量更清晰。

另外我寫了一個條件(如果numcpu> 1),它沒有意義使用並行區域並進行正常的順序執行。

最後,因爲我已經閱讀過動態塊,它取決於硬件和系統配置......所以我已經將它作爲一個常量,所以它可以很容易地更改。

而且我適應線程數可用處理器的數量..

int main(int argc, char *argv[]) 
{ 
    omp_set_dynamic(1); 

    int xactual, yactual; 

    //each iteration, it calculates: newz = oldz*oldz + p, where p is the current pixel, and oldz stars at the origin 
    double pr, pi;     //real and imaginary part of the pixel p 
    double newRe, newIm, oldRe, oldIm; //real and imaginary parts of new and old z 
    double zoom = 1, moveX = -0.5, moveY = 0; //you can change these to zoom and change position 

    pixel_t *pixels = malloc(sizeof(pixel_t)*IMAGEHEIGHT*IMAGEWIDTH); 
    clock_t begin, end; 
    double time_spent;  

    begin=clock(); 

    int numcpu; 
    numcpu = omp_get_num_procs(); 

    //FILE * fp; 
    printf("El número de procesadores que utilizaremos es: %d", numcpu); 

    omp_set_num_threads(numcpu); 

    #pragma omp parallel shared(pixels, moveX, moveY, zoom) private(xactual, yactual, pr, pi, newRe, newIm) (if numcpu>1) 
    { 
     //int xactual=0; 
    // int yactual=0; 
     #pragma omp for schedule(dynamic, CHUNK)  

    //loop through every pixel 
     for(yactual = 0; yactual < IMAGEHEIGHT; yactual++) 
      for(xactual = 0; xactual < IMAGEWIDTH; xactual++) 
      { 
       //calculate the initial real and imaginary part of z, based on the pixel location and zoom and position values 
      pr = 1.5 * (xactual - IMAGEWIDTH/2)/(0.5 * zoom * IMAGEWIDTH) + moveX; 
      pi = (yactual - IMAGEHEIGHT/2)/(0.5 * zoom * IMAGEHEIGHT) + moveY; 
      newRe = newIm = oldRe = oldIm = 0; //these should start at 0,0 
      //"i" will represent the number of iterations 
      int i; 
      //start the iteration process 
      for(i = 0; i < ITERATIONS; i++) 
      { 
       //remember value of previous iteration 
       oldRe = newRe; 
       oldIm = newIm; 
       //the actual iteration, the real and imaginary part are calculated 
       newRe = oldRe * oldRe - oldIm * oldIm + pr; 
       newIm = 2 * oldRe * oldIm + pi; 
       //if the point is outside the circle with radius 2: stop 
       if((newRe * newRe + newIm * newIm) > 4) break; 
      } 

      //   color(i % 256, 255, 255 * (i < maxIterations)); 
      if(i == ITERATIONS) 
      { 
       //color(0, 0, 0); // black 
       pixels[yactual*IMAGEWIDTH+xactual][0] = 0; 
       pixels[yactual*IMAGEWIDTH+xactual][1] = 0; 
       pixels[yactual*IMAGEWIDTH+xactual][2] = 0; 
      } 
      else 
      { 
       double z = sqrt(newRe * newRe + newIm * newIm); 
       int brightness = 256 * log2(1.75 + i - log2(log2(z)))/log2((double)ITERATIONS); 

       //color(brightness, brightness, 255) 
       pixels[yactual*IMAGEWIDTH+xactual][0] = brightness; 
       pixels[yactual*IMAGEWIDTH+xactual][1] = brightness; 
       pixels[yactual*IMAGEWIDTH+xactual][2] = 255; 
      }  

     } 

    } //end of parallel region 

    end= clock(); 

    time_spent = (double)(end - begin)/CLOCKS_PER_SEC; 
    fprintf(stderr, "Elapsed time: %.2lf seconds.\n", time_spent); 
+0

是的,提高代碼速度的方法是使用SIMD。我在SSE和AVX上做這個。這是用於x86處理器嗎?如果添加SIMD和/或SSE或AVX標籤,您可能會得到更好的答案。 –

+0

您可能也有興趣在這裏的代碼:https://stackoverflow.com/questions/48069990/multithreaded-simd-vectorized-mandelbrot-in-r-using-rcpp-openmp –

回答

0

你可以擴展到利用SIMD擴展的實施。據我所知,最新的OpenMP標準包含矢量結構。查看描述新功能的this article

whitepaper解釋如何在計算Mandelbrot集合時使用SSE3。

+0

SIMD是一個很好的建議。這是我用來加速Mandelbrot計算的工具之一。 –