C++函數優化

我有一個函數如下，它被調用很多次，這使得我的程序運行緩慢。有什麼辦法可以優化它嗎？例如，使用SIMD指令或其他技術。 getray（）函數用於從預先計算的查找表中檢索vector-3給定的vector-2查詢。它在Visual-studio-2013中編譯，目標配置是x64機器。C++函數優化

順便說一句，多次調用此函數的for循環已經通過使用OpenMP進行了優化。

非常感謝。

bool warpPlanarHomography(
const Eigen::Matrix3d& H_camera2_camera1 
, const cv::Mat& image1 
, const cv::Mat& image2 
, FisheyeCameraUnified& cam1 
, FisheyeCameraUnified& cam2 
, const Eigen::Vector2i& patchCenter 
, const int patchSize 
, Eigen::Matrix<unsigned char, 7, 7>& patch1) 
{ 
const int patchSize_2 = 3; 
for (int v = 0; v < patchSize; ++v) // row 
{ 
    for (int u = 0; u < patchSize; ++u) 
    { 
     Eigen::Vector2i p1 = Eigen::Vector2i(u - patchSize_2, v - patchSize_2).cast<int>() + patchCenter; 

     if (p1(0, 0) < 0 || p1(1, 0) < 0 || p1(0, 0) >= image1.cols || p1(1, 0) >= image1.rows) return false; 

     Eigen::Vector3d ray1; 
     cam1.getRay(p1(1, 0), p1(0, 0), ray1); 
     Eigen::Vector2d p2; 
     if (!cam2.project(H_camera2_camera1 * ray1, p2)) 
     { 
      return false; 
     } 
     if (p2.x() < 0.0 || p2.x() >= image2.cols - 1 || 
      p2.y() < 0.0 || p2.y() >= image2.rows - 1) 
     { 
      return false; 
     } 
     getInterpolatedPixel(image2, p2, &patch1(v, u)); 
    } 
} 
return true; 
}

，在項目的功能看起來像這樣

bool FisheyeCameraUnified::project(const Eigen::Vector3d& ray, Eigen::Vector2d& pt) 
{ 
    double fx, fy, cx, cy, xi; 
    fx = m_K(0, 0); 
    fy = m_K(1, 1); 
    cx = m_K(0, 2); 
    cy = m_K(1, 2); 
    xi = m_xi; 

    double d = ray.norm(); 
    double rz = 1.0/(ray(2) + xi * d); 

    // Project the scene point to the normalized plane. 
    Eigen::Vector2d m_d(ray(0) * rz, ray(1) * rz); 

    // Apply the projection matrix. 
    pt(0) = fx * m_d(0) + cx; 
    pt(1) = fy * m_d(1) + cy; 
    return true; 
}

和getInterpolatedPixel（）函數如下

void getInterpolatedPixel(const cv::Mat& image, const Eigen::Vector2d& coords, unsigned char* pixel) 
{ 
    int ix = static_cast<int>(coords.x()); 
    int iy = static_cast<int>(coords.y()); 
    double dx = coords.x() - ix; 
    double dy = coords.y() - iy; 
    double dxdy = dx * dy; 

    const double w00 = 1.0 - dx - dy + dxdy; 
    const double w01 = dx - dxdy; 
    const double w10 = dy - dxdy; 
    const double w11 = dxdy; 

    const unsigned char* p00 = image.data + iy * image.step.p[0] + ix * image.channels(); 
    const unsigned char* p01 = p00 + image.channels(); 
    const unsigned char* p10 = p00 + image.step.p[0]; 
    const unsigned char* p11 = p10 + image.channels(); 

    for (int i = 0; i < image.channels(); ++i) 
    { 
     double value = w11 * p11[i] + w10 * p10[i] + w01 * p01[i] + w00 * p00[i]; 
     pixel[i] = cv::saturate_cast<unsigned char>(value); 
    } 
}

來源

2016-08-19 Peidong

正如你所說的，在'getInterpolatedPixel'是你的瓶頸環，你有沒有嘗試使用OpenMP的是什麼？ OpenMP是您的選擇嗎？對於簡單的SIMD指令使用，請嘗試[VC]（https://github.com/VcDevel/Vc）。 –

我沒有嘗試使用OpenMP獲取getInterpolatedPixel函數，但我嘗試了WarpPlanarHomography。它沒有給我任何好處。我想原因在於for循環很小，效率提高無法彌補由OpenMP引起的開銷。所以我認爲用SIMD優化這個小函數可能是一個好主意，並用OpenMP優化外部大的for-loop。是的，謝謝你的圖書館信息。我會嘗試。 – Peidong

我認爲你的'項目'功能被複制和粘貼搞砸了 –

措施哪裏是瓶頸，並嘗試優化那個地方第一次
您能否使用floatdouble？
什麼m_K(0, 0)，m_K(1, 1) ...你可以用常量
UNROLL for (int i = 0; i < image.channels(); ++i)循環替換這個圖像只能有渠道的具體數目（1，3，4是典型的數字）
呼叫image.channels()只有一次，使用儲值後
嘗試加入inline modifyer小功能

來源

2016-08-19 10:11:17

在當今大多數架構中，雙打比浮動更快。 –

感謝您的回覆。我測量了preojct（）函數和getinterpolatedpixel函數的瓶頸。其餘的，這是可能的。我會嘗試。 – Peidong

我在想什麼可以將for循環進行parall化，因爲對於7x7補丁，它被稱爲49次，對於不同的數據使用相同的指令？ – Peidong

這應該除了其他考慮，更廣泛地集中解答。

由於getInterpolatedPixel在一個緊密的循環使用，我專注於有減少的函數調用：

void getInterpolatedPixel(const cv::Mat& image, const Eigen::Vector2d& coords, unsigned char* pixel) 
{ 
    //save two function calls here 
    double dx = coords.x(); 
    double dy = coords.y(); 
    int ix = static_cast<int>(dx); 
    int iy = static_cast<int>(dy); 
    dx -= ix; 
    dy -= iy; 
    //make this const 
    const double dxdy = dx * dy; 

    const double w00 = 1.0 - dx - dy + dxdy; 
    const double w01 = dx - dxdy; 
    const double w10 = dy - dxdy; 
    const double w11 = dxdy; 

    //cache image.channels() 
    const int channels = image.channels(); 

    const unsigned char* p00 = image.data + iy * image.step.p[0] + ix * channels; 
    const unsigned char* p01 = p00 + channels; 
    const unsigned char* p10 = p00 + image.step.p[0]; 
    const unsigned char* p11 = p10 + channels; 

    for (int i = 0; i < channels; ++i) 
    { 
     double value = w11 * p11[i] + w10 * p10[i] + w01 * p01[i] + w00 * p00[i]; 
     pixel[i] = cv::saturate_cast<unsigned char>(value); 
    } 
}

來源

2016-08-19 10:46:18

C++函數優化

回答

相關問題