計算64×64 int產品的高64位C

我希望我的C函數有效地計算兩個64位帶符號整數的乘積的高64位。我知道如何在x86-64程序集中做到這一點，並使用imulq並將結果從％rdx中提取出來。但是我完全不知道如何用C編寫它，更不用說哄編譯器來高效地完成它了。計算64×64 int產品的高64位C

有沒有人有任何建議在C寫這個？這是對性能敏感的，所以「手動方法」（如俄羅斯農民或者bignum庫）已經不在了。

這傻乎乎的內聯彙編函數我寫的作品和大致經過我的代碼生成：

static long mull_hi(long inp1, long inp2) { 
    long output = -1; 
    __asm__("movq %[inp1], %%rax;" 
      "imulq %[inp2];" 
      "movq %%rdx, %[output];" 
      : [output] "=r" (output) 
      : [inp1] "r" (inp1), [inp2] "r" (inp2) 
      :"%rax", "%rdx"); 
    return output; 
}

來源

2009-10-09 fish

一般的答案是x * y可以分解成(a + b) * (c + d)，其中a和c是高階部分。

一是擴大到ac + ad + bc + bd

現在，你乘的條款存儲爲long long 32位數字（或更好，但uint64_t），你只要記住，當你乘更高的訂單號碼，您需要縮放32位。然後你做了補充，記住要檢測進位。跟蹤標誌。當然，你需要做一些補充。

來源

2009-10-09 01:52:05 DigitalRoss

我喜歡用因子h。這給出（ha + b）*（hc + d）= hhac + had + hbc + bd。 'h'基本上是一種跟蹤32位的方式。每個術語都需要64位（忽略h因子），給出32位格雷克斯，但是（2^n）-1 *（2^n）-1 =（2^2n）-2（2^n）+ 1，這是<（2^2n）-1，留下淨空以增加一個較低期限的進位。 hhac術語是純粹的溢出，正如had和hbc術語中的那樣。你可以使用h（ad + bc）而不是+ hbc - 它的超過64位，但溢出並不重要 - 你放棄了它。 – Steve314 2009-10-09 02:38:50

Steve314：你以前做過！好點。我昨晚輸入了一個實現，並將其作爲新的答案發送。 – DigitalRoss 2009-10-09 22:00:11

等待，你有一個完美的優化組裝解決方案，已經爲此工作了，並且你想退出並嘗試將它寫入這個不支持128位數學的環境？我沒有跟隨。

正如你所知道的，這個操作是一個關於 x86-64的單指令。顯然，你所做的任何事情都不會讓它工作得更好。如果你真的想要便攜式C，你需要做一些類似於 DigitalRoss的代碼，並希望你的優化器能夠找出你正在做什麼。

如果您需要架構便攜性，而且願意自己限制到GCC平臺，有__int128_t（和__uint128_t）類型的編譯器內在會做你想要什麼。

來源

2009-10-09 02:32:43

如果你在x86_64使用相對較新的GCC：

int64_t mulHi(int64_t x, int64_t y) { 
    return (int64_t)((__int128_t)x*y >> 64); 
}

在-O1及更高版本，編譯到你想要的東西：

_mulHi: 
0000000000000000 movq %rsi,%rax 
0000000000000003 imulq %rdi 
0000000000000006 movq %rdx,%rax 
0000000000000009 ret

我相信鐺和VC++也有__int128_t類型的支持，所以這也應該在這些平臺上工作，通常有關於自己嘗試的警告。

來源

2009-10-09 02:41:46

關於您的組裝解決方案，請勿硬編碼mov說明！讓編譯器爲你做。這是你的代碼的修改版本：

static long mull_hi(long inp1, long inp2) { 
    long output; 
    __asm__("imulq %2" 
      : "=d" (output) 
      : "a" (inp1), "r" (inp2)); 
    return output; 
}

有益的參考：Machine Constraints

來源

2009-10-09 02:52:46

既然你做了很好的工作解決了與機器代碼你自己的問題，我想你應該得到一些幫助，便攜式版本。如果在x86上的gnu中，我會在ifdef中只使用程序集。

無論如何，這裏是一個實現...我敢肯定這是正確的，但沒有保證，昨晚我剛剛敲出了這個......你可能應該擺脫靜態positive_result []和result_negative，這些只是我的單元測試的文物......

#include <stdlib.h> 
#include <stdio.h> 

// stdarg.h doesn't help much here because we need to call llabs() 

typedef unsigned long long uint64_t; 
typedef signed long long int64_t; 

#define B32 0xffffffffUL 

static uint64_t positive_result[2]; // used for testing 
static int result_negative;   // used for testing 

static void mixed(uint64_t *result, uint64_t innerTerm) 
{ 
    // the high part of innerTerm is actually the easy part 

    result[1] += innerTerm >> 32; 

    // the low order a*d might carry out of the low order result 

    uint64_t was = result[0]; 

    result[0] += (innerTerm & B32) << 32; 

    if (result[0] < was) // carry! 
     ++result[1]; 
} 


static uint64_t negate(uint64_t *result) 
{ 
    uint64_t t = result[0] = ~result[0]; 
    result[1] = ~result[1]; 
    if (++result[0] < t) 
    ++result[1]; 
    return result[1]; 
} 

uint64_t higherMul(int64_t sx, int64_t sy) 
{ 
    uint64_t x, y, result[2] = { 0 }, a, b, c, d; 

    x = (uint64_t)llabs(sx); 
    y = (uint64_t)llabs(sy); 

    a = x >> 32; 
    b = x & B32; 
    c = y >> 32; 
    d = y & B32; 

    // the highest and lowest order terms are easy 

    result[1] = a * c; 
    result[0] = b * d; 

    // now have the mixed terms ad + bc to worry about 

    mixed(result, a * d); 
    mixed(result, b * c); 

    // now deal with the sign 

    positive_result[0] = result[0]; 
    positive_result[1] = result[1]; 
    result_negative = sx < 0^sy < 0; 
    return result_negative ? negate(result) : result[1]; 
}

來源

2009-10-09 21:34:11 DigitalRoss

計算64×64 int產品的高64位C

回答

相關問題