2009-07-31 160 views
97

我需要根據隨機值生成唯一的ID。如何在Python中生成唯一的ID?

+3

你可以更具體的什麼樣的獨特的ID。它需要成爲一個數字嗎?或者它可以包含字母?給出一些id的類型的例子。 – MitMaro 2009-07-31 02:54:14

+0

可能相關,所有的對象都有唯一的id id(my_object)或id(self)。對於我來說,考慮到python中的所有內容都是一個對象並且有一個數字ID就足夠了;字符串:`id('Hello World')`類:`id('Hello World')`,一切都有一個id。 – ThorSummoner 2015-10-01 22:06:29

+1

其實我在使用id時遇到了問題,它似乎與變量名有一些關係,其中具有相同名稱的變量獲得的id與剛剛替換的變量相同。大概避免使用id,除非你有良好的單元測試,並確定它的行爲是你想要的。 – ThorSummoner 2015-10-01 22:56:30

回答

114

也許uuid.uuid4()可能會完成這項工作。有關更多信息,請參閱uuid

+18

請注意,該模塊的基礎庫存在漏洞,並且傾向於分發一個保持FD打開的進程。這在新版本中得到了解決,但大多數人可能還沒有,所以我通常會避免使用這個模塊。導致我聽到主要的頭痛... – 2009-07-31 03:14:59

+2

@Glenn:關於哪個版本是越野車的更多細節?我在生產代碼中使用它(並且將在更新的版本中推出更多用途)。我現在很害怕! – 2010-11-06 07:38:23

+1

@Matthew:我不知道它是否已被修復,但使用uuidlib分叉而不關閉FD的uuid後端,所以我當時打開的TCP套接字永遠不會關閉,以後我不能重新打開端口。我必須以root身份手動殺死`uuidd`。我通過將`uuid._uuid_generate_time`和`uuid._uuid_generate_random`設置爲None來解決這個問題,所以`uuid`模塊從未使用過本地實現。 (無論如何這應該是一個選項;生成V4隨機UUID導致守護進程被啓動是完全沒有必要的。) – 2010-11-06 11:22:01

-7
import time 
def new_id(): 
    time.sleep(0.000001) 
    return time.time() 

在我的系統上,time.time()似乎在小數點後提供了6位有效數字。短暫的睡眠應該保證是獨一無二的,在過去的兩三位數字中至少有一定量的隨機性。

如果你擔心的話,你也可以對它進行哈希處理。

15

唯一和隨機是互斥的。也許你想要這個?

import random 
def uniqueid(): 
    seed = random.getrandbits(32) 
    while True: 
     yield seed 
     seed += 1 

用法:

unique_sequence = uniqueid() 
id1 = next(unique_sequence) 
id2 = next(unique_sequence) 
id3 = next(unique_sequence) 
ids = list(itertools.islice(unique_sequence, 1000)) 

沒有兩個返回的id是相同的(唯一的),這是基於隨機種子值

5
import time 
import random 
import socket 
import hashlib 

def guid(*args): 
    """ 
    Generates a universally unique ID. 
    Any arguments only create more randomness. 
    """ 
    t = long(time.time() * 1000) 
    r = long(random.random()*100000000000000000L) 
    try: 
     a = socket.gethostbyname(socket.gethostname()) 
    except: 
     # if we can't get a network address, just imagine one 
     a = random.random()*100000000000000000L 
    data = str(t)+' '+str(r)+' '+str(a)+' '+str(args) 
    data = hashlib.md5(data).hexdigest() 

    return data 
4

在這裏你可以找到一個實現:

def __uniqueid__(): 
    """ 
     generate unique id with length 17 to 21. 
     ensure uniqueness even with daylight savings events (clocks adjusted one-hour backward). 

     if you generate 1 million ids per second during 100 years, you will generate 
     2*25 (approx sec per year) * 10**6 (1 million id per sec) * 100 (years) = 5 * 10**9 unique ids. 

     with 17 digits (radix 16) id, you can represent 16**17 = 295147905179352825856 ids (around 2.9 * 10**20). 
     In fact, as we need far less than that, we agree that the format used to represent id (seed + timestamp reversed) 
     do not cover all numbers that could be represented with 35 digits (radix 16). 

     if you generate 1 million id per second with this algorithm, it will increase the seed by less than 2**12 per hour 
     so if a DST occurs and backward one hour, we need to ensure to generate unique id for twice times for the same period. 
     the seed must be at least 1 to 2**13 range. if we want to ensure uniqueness for two hours (100% contingency), we need 
     a seed for 1 to 2**14 range. that's what we have with this algorithm. You have to increment seed_range_bits if you 
     move your machine by airplane to another time zone or if you have a glucky wallet and use a computer that can generate 
     more than 1 million ids per second. 

     one word about predictability : This algorithm is absolutely NOT designed to generate unpredictable unique id. 
     you can add a sha-1 or sha-256 digest step at the end of this algorithm but you will loose uniqueness and enter to collision probability world. 
     hash algorithms ensure that for same id generated here, you will have the same hash but for two differents id (a pair of ids), it is 
     possible to have the same hash with a very little probability. You would certainly take an option on a bijective function that maps 
     35 digits (or more) number to 35 digits (or more) number based on cipher block and secret key. read paper on breaking PRNG algorithms 
     in order to be convinced that problems could occur as soon as you use random library :) 

     1 million id per second ?... on a Intel(R) Core(TM)2 CPU 6400 @ 2.13GHz, you get : 

     >>> timeit.timeit(uniqueid,number=40000) 
     1.0114529132843018 

     an average of 40000 id/second 
    """ 
    mynow=datetime.now 
    sft=datetime.strftime 
    # store old datetime each time in order to check if we generate during same microsecond (glucky wallet !) 
    # or if daylight savings event occurs (when clocks are adjusted backward) [rarely detected at this level] 
    old_time=mynow() # fake init - on very speed machine it could increase your seed to seed + 1... but we have our contingency :) 
    # manage seed 
    seed_range_bits=14 # max range for seed 
    seed_max_value=2**seed_range_bits - 1 # seed could not exceed 2**nbbits - 1 
    # get random seed 
    seed=random.getrandbits(seed_range_bits) 
    current_seed=str(seed) 
    # producing new ids 
    while True: 
     # get current time 
     current_time=mynow() 
     if current_time <= old_time: 
      # previous id generated in the same microsecond or Daylight saving time event occurs (when clocks are adjusted backward) 
      seed = max(1,(seed + 1) % seed_max_value) 
      current_seed=str(seed) 
     # generate new id (concatenate seed and timestamp as numbers) 
     #newid=hex(int(''.join([sft(current_time,'%f%S%M%H%d%m%Y'),current_seed])))[2:-1] 
     newid=int(''.join([sft(current_time,'%f%S%M%H%d%m%Y'),current_seed])) 
     # save current time 
     old_time=current_time 
     # return a new id 
     yield newid 

""" you get a new id for each call of uniqueid() """ 
uniqueid=__uniqueid__().next 

import unittest 
class UniqueIdTest(unittest.TestCase): 
    def testGen(self): 
     for _ in range(3): 
      m=[uniqueid() for _ in range(10)] 
      self.assertEqual(len(m),len(set(m)),"duplicates found !") 

希望它有幫助!

2

也許這工作ü

str(uuid.uuid4().fields[-1])[:5] 
2

這會工作得很快,但不會產生隨機值,但那些單調遞增(對於給定的線程)。

import threading 

_uid = threading.local() 
def genuid(): 
    if getattr(_uid, "uid", None) is None: 
     _uid.tid = threading.current_thread().ident 
     _uid.uid = 0 
    _uid.uid += 1 
    return (_uid.tid, _uid.uid) 

它是線程安全的,並與元組的工作可能有益處,而不是字符串(短如果有的話)。如果您不需要線程安全免費刪除線程位(而不是threading.local,使用對象()和刪除tid共)。

希望有所幫助。