2010-06-08 71 views
2

有誰知道64位整數,將大部分的URL表現良好URL的完美散列函數的?完美哈希函數的URL

+7

如果它是一個完美的哈希值,顧名思義它表現良好。 – 2010-06-08 05:31:40

+1

爲什麼不會有任何基本的字符串散列函數呢?網址只是字符串,看起來很像其他字符串給我。任何好的字符串散列函數都應該表現得非常好,如果與載入因子相比,存儲區的數量是不錯的。 – 2010-06-08 05:33:07

+1

@Ira Baxter 對不起,我的意思是在散列大小方面與可接受的URL模式相比表現良好。 根據我的理解,「完美散列函數」爲某些輸入執行沒有衝突的映射。 – 2010-06-08 06:22:33

回答

2

發現這個標記爲"Base52 url shortener perfect hash function in C"http://lambdajones.com/b52

const char *b52idx[52] = { 
    "0", "1", "2", "3", "4", "5", "6", "7", "8", "9", 
    "B", "C", "D", "F", "G", "H", "J", "K", "L", "M", 
    "N", "P", "Q", "R", "S", "T", "V", "W", "X", "Y", 
    "Z", "b", "c", "d", "f", "g", "h", "j", "k", "l", 
    "m", "n", "p", "q", "r", "s", "t", "v", "w", "x", 
    "y", "z" 
}; 

#define X 0xff 
const int b52map[128] = { 
    X, X, X, X, X, X, X, X, X, X, X, X, X, X, X, X, 
    X, X, X, X, X, X, X, X, X, X, X, X, X, X, X, X, 
    X, X, X, X, X, X, X, X, X, X, X, X, X, X, X, X, 
// 0 1 2 3 4 5 6 7 8 9 
    0, 1, 2, 3, 4, 5, 6, 7, 8, 9, X, X, X, X, X, X, 
//  B C D  F G H  J K L M N 
    X, X,10,11,12, X,13,14,15, X,16,17,18,19,20, X, 
// P Q R S T  V W X Y Z 
    21,22,23,24,25, X,26,27,28,29,30, X, X, X, X, X, 
//  b c d  f g h  j k l m n 
    X, X,31,32,33, X,34,35,36, X,37,38,39,40,41, X, 
// p q r s t  v w x y z 
    42,43,44,45,46, X,47,48,49,50,51, X, X, X, X, X 
}; 

#ifdef __GNUC__ 
#define likely(x) __builtin_expect((x),1) 
#else 
#define likely(x) (x) 
#endif 

/* 
    valid from 00000 -> zzzzz, good for 380204032 urls 
    returns the integral short url id 
*/ 
unsigned long long b52(const char *c) { 
    unsigned long long x = 0; 
    unsigned long long y = 0; 
    unsigned long long z = 0; 

    x |= b52map[c[0]] << 24 | b52map[c[1]] << 18 | \ 
     b52map[c[2]] << 12 | b52map[c[3]] << 6 | b52map[c[4]]; 

    y += (x/64) * 12; 
    if (x > 4095) y += 624 * (x/4096); 
    if (x > 262143) y += 32448 * (x/262144); 
    if (x > 16777215) y += 1687296 * (x/16777215); 

    if (likely((z = x - y) < 380204033)) return z; 
    else return 380204033; 
} 

void b52inc(char *id) { 
    int x[5] = { 
    b52map[id[0]], b52map[id[1]], b52map[id[2]],b52map[id[3]], b52map[id[4]] 
    }; 
    int y = 5; 

    // search for the first character we can increment (51 == 'z') 
    // increment from the b52idx table and update id 
    while (y--) if (x[y] < 51) break; 
    id[y] = *b52idx[++x[y]]; 

    // if we passed over id's 'z's above, roll them over 
    while (y++ < 5) if (x[y] == 51) id[y] = '0'; 
} 
3

這是不可能創造一個完美的哈希函數,如果你不知道集,你會提前查詢鍵。如果你知道,那麼你可以使用類似gperf或cmph的東西來爲你生成完美的哈希函數。

http://cmph.sourceforge.net/

我認爲一個完美的哈希函數是不是你真正想要的東西,所以它足以讓你使用任何合理的哈希函數在那裏,像雜音散列或鮑勃·詹金斯哈希,利用哈希表實現,像谷歌的__gnu_cxx :: hash_map或dense_hash_map和sparse_hash_map。

http://code.google.com/p/google-sparsehash/ http://sites.google.com/site/murmurhash/ http://burtleburtle.net/bob/hash/doobs.html