2016-03-05 53 views
1

我正在嘗試爲域名中最左側的通配符編寫正則表達式。到目前爲止,我有這樣的:僅匹配域名中最左側的通配符 - Python

import re 
    o = urlparse(argv[1]) 
host_name = o.netloc 
context = SSL.Context(SSL.TLSv1_METHOD) # Use TLS Method 
context.set_options(SSL.OP_NO_SSLv2) # Don't accept SSLv2 
context.set_verify(SSL.VERIFY_PEER | SSL.VERIFY_FAIL_IF_NO_PEER_CERT, 
        callback) 
# context.load_verify_locations(ca_file, ca_path) 

sock = socket() 
ssl_sock = SSL.Connection(context, sock) 
ssl_sock.connect((host_name, 443)) 
ssl_sock.set_connect_state() 
ssl_sock.set_tlsext_host_name(host_name) 
ssl_sock.do_handshake() 

cert = ssl_sock.get_peer_certificate() 
common_name = cert.get_subject().commonName.decode() 
print "Common Name: ", common_name 
print "Cert number: ", cert.get_serial_number() 
regex = common_name.replace('.', r'\.').replace('*',r'.*') + '$' 
if re.match(regex, host_name): 
    print "matches" 
else: 
    print "invalid" 

# output: 
Common Name: *.example.com 
Cert number: 63694395280496902491340707875731768741 

然而,正則表達式不僅*.example.com,但*.*.*www.*.com匹配。此外,https://wrong.host.example.com/不應該被允許匹配。我怎樣才能確保它只匹配最左邊的標籤?

+0

什麼是正確匹配的例子? – Alexander

+0

* .example.com是一個有效的正確匹配 – cybertextron

回答

0

你可以使用urlparse和split來代替正則表達式。

from urlparse import urlparse 
. 
. 
common_name = cert.get_subject().commonName.decode() 
domain = urlparse(common_name).netloc 
host = domain.split('.',1)[0] 
0

你可以試試這個正則表達式:

r'(?:^|\s)(\w+\.)?example\.com(?:$|\s)' 

完整的示例:

sock = socket() 
ssl_sock = SSL.Connection(context, sock) 
ssl_sock.connect((host_name, 443)) 
ssl_sock.set_connect_state() 
ssl_sock.set_tlsext_host_name(host_name) 
ssl_sock.do_handshake() 

cert = ssl_sock.get_peer_certificate() 
common_name = cert.get_subject().commonName.decode() 
print "Common Name: ", common_name 
print "Cert number: ", cert.get_serial_number() 

rxString = r'(?:^|\s)(\w+\.)?' + common_name.replace('.', '\.')[3:] + '(?:$|\s)' 
regex = re.compile(rxString) 

if regex.match(host_name): 
    print "matches" 
else: 
    print "invalid" 

輸入:

url     
------------------- 
www.example.com  
example.com   
hello.example.com 
foo.bar.example.com 
*.*.*    
www.*.com   

輸出:

url     | result 
------------------- | ----------- 
www.example.com  | matches 
example.com   | matches 
hello.example.com | matches 
foo.bar.example.com | invalid 
*.*.*    | invalid 
www.*.com   | invalid 
+0

Saleem,我如何構建像你的正則表達式?我當前的是'regex = common_name.replace('。',r'\。')。replace('*',r'。*')+'$''那麼我該怎麼做類似的東西呢? – cybertextron

+0

@philippe很簡單。我想你不需要替換了。只需將're.match(regex,host_name)'替換爲'p.match(host_name)'',假設您複製我的解決方案的前兩行 – Saleem