2014-10-16 111 views
1

我是SSE編碼的新手。我想爲我的算法編寫一個SSE代碼。我想將C代碼轉換爲SSE代碼。SSE中的比較操作

for(int i=1;i<height;i++) 
{ 
    for(int j=1;j<width;j++) 
    { 
     int index = 0; 
     if(input[width*i + j]<=input[width*(i-1)+(j-1)])) index += 0x80; 
     if(input[width*i + j]<=input[width*(i-1)+(j )])) index += 0x40; 
     if(input[width*i + j]<=input[width*(i-1)+(j+1)])) index += 0x20; 
     if(input[width*i + j]<=input[width*(i )+(j-1)])) index += 0x10; 
     if(input[width*i + j]<=input[width*(i )+(j+1)])) index += 0x08; 
     if(input[width*i + j]<=input[width*(i+1)+(j-1)])) index += 0x04; 
     if(input[width*i + j]<=input[width*(i+1)+(j )])) index += 0x02; 
     if(input[width*i + j]<=input[width*(i+1)+(j+1)])) index ++; 
     output[width*(i-1)+(j-1)] = index; 

    } 
} 

這裏是我的SSE代碼:

unsigned char *dst_d = outputbuffer 
float *CT_image_0 = inputbuffer; 
float *CT_image_1 = CT_image_0 + width; 
float *CT_image_2 = CT_image_1 + width; 
for(int i=1;i<height;i++) 
{ 
    for(int j=1;j<width;j+=4) 
    { 

     __m128 CT_current_00 = _mm_loadu_ps((CT_image_0+j-1)); 
     __m128 CT_current_10 = _mm_loadu_ps((CT_image_1+j-1)); 
     __m128 CT_current_20 = _mm_loadu_ps((CT_image_2+j-1)); 

     __m128 CT_current_01 = _mm_loadu_ps(((CT_image_0+1)+j-1)); 
     __m128 CT_current_11 = _mm_loadu_ps(((CT_image_1+1)+j-1)); 
     __m128 CT_current_21 = _mm_loadu_ps(((CT_image_2+1)+j-1)); 

     __m128 CT_current_02 = _mm_loadu_ps(((CT_image_0+2)+j-1)); 
     __m128 CT_current_12 = _mm_loadu_ps(((CT_image_1+2)+j-1)); 
     __m128 CT_current_22 = _mm_loadu_ps(((CT_image_2+2)+j-1)); 

     __m128 val = CT_current_11; 

     //Below I tried to write the SSE instruction but that was wrong :( 
     //--How I can do index + ...operation with this _mm_cmple_ss return value ???? 
     __m128 sample6= _mm_cmple_ss(val,CT_current_00); 
     sample6 += _mm_cmple_ss(val,CT_current_01); 
     sample6 += _mm_cmple_ss(val,CT_current_02); 
     sample6 += _mm_cmple_ss(val,CT_current_10); 
     sample6 +=_mm_cmple_ss(val,CT_current_12); 
     sample6 +=_mm_cmple_ss(val,CT_current_20); 
     sample6 +=_mm_cmple_ss(val,CT_current_21); 
     sample6 +=_mm_cmple_ss(val,CT_current_22); 
    } 
    CT_image_0 +=width; 
    CT_image_1 +=width; 
    CT_image_2 +=width; 
    dst_d += (width-2); 
} 

我打破了我的頭,並試圖(作爲外行的人),如果條件......請給我一些這方面的主意,用???

回答

1

需要工作的一部分顯然是這樣的:

__m128 sample6= _mm_cmple_ss(val,CT_current_00); 
    sample6 += _mm_cmple_ss(val,CT_current_01); 
    sample6 += _mm_cmple_ss(val,CT_current_02); 
    sample6 += _mm_cmple_ss(val,CT_current_10); 
    sample6 +=_mm_cmple_ss(val,CT_current_12); 
    sample6 +=_mm_cmple_ss(val,CT_current_20); 
    sample6 +=_mm_cmple_ss(val,CT_current_21); 
    sample6 +=_mm_cmple_ss(val,CT_current_22); 

你需要所有的比較結果組合成一組標誌,例如像這樣:

__m128i out = _mm_setzero_si128();    // init output flags to all zeroes 
    __m128i test; 

    test = _mm_cmple_ss(val, CT_current_00);   // compare 
    test = _mm_and_si128(test, _mm_set1_epi32(0x80)); // mask all but required flag 
    out = _mm_or_si128(out, test);     // merge flags to output mask 
    test = _mm_cmple_ss(val, CT_current_01); 
    test = _mm_and_si128(test, _mm_set1_epi32(0x40)); 
    out = _mm_or_si128(out, test); 
    // ... repeat for each offset and flag value 
    // ... then finally extract 4 bytes from `out` 
    // ... and store at output[width*(i-1)+(j-1)] 
-3

我不知道SSE是什麼代碼,但很可能你想要運行一個/或某些組合將CT_current變量組合到一個字符串數組中,然後將它們連接成一個具有前面提到的列表的List(通過您的代碼),CT = **規格(其中CT **是你以後放置的所有東西);爲了迭代返回到您打印的_m128,然後如您所知,您可以按照以前的方法進行兩次迭代。

祝你好運。

+1

-1:完全不清楚你想在這裏說什麼 – 2014-10-16 11:13:03