解析HTTP請求授權頭與Python

我需要一個標題是這樣的：解析HTTP請求授權頭與Python

Authorization: Digest qop="chap", 
    realm="[email protected]", 
    username="Foobear", 
    response="6629fae49393a05397450978507c4ef1", 
    cnonce="5ccc069c403ebaf9f0171e9517f40e41"

並解析它這個使用Python：

{'protocol':'Digest', 
    'qop':'chap', 
    'realm':'[email protected]', 
    'username':'Foobear', 
    'response':'6629fae49393a05397450978507c4ef1', 
    'cnonce':'5ccc069c403ebaf9f0171e9517f40e41'}

是否有一個圖書館要做到這一點，或者我可以從中尋找靈感？

我在Google App Engine上這樣做，我不確定Pyparsing庫是否可用，但是如果它是最佳解決方案，也許我可以將它包含在我的應用程序中。

目前我正在創建自己的MyHeaderParser對象，並在頭字符串中使用reduce（）。它正在工作，但非常脆弱。通過下面納迪亞

輝煌的解決方案：

import re 

reg = re.compile('(\w+)[=] ?"?(\w+)"?') 

s = """Digest 
realm="stackoverflow.com", username="kixx" 
""" 

print str(dict(reg.findall(s)))

來源

2009-08-28 Kris Walker

到目前爲止，解決方案哈事實證明它只是超級乾淨，但也非常強大。儘管不是RFC的最「靠書」實現，但我還沒有構建一個返回無效值的測試用例。然而，我只用它來解析授權頭，我感興趣的其他頭文件的隨機數需要解析，所以這可能不是一個很好的解決方案，因爲它是一個通用的HTTP頭解析器。 – 2009-09-04 11:35:52

小的正則表達式：

import re 
reg=re.compile('(\w+)[:=] ?"?(\w+)"?') 

>>>dict(reg.findall(headers)) 

{'username': 'Foobear', 'realm': 'testrealm', 'qop': 'chap', 'cnonce': '5ccc069c403ebaf9f0171e9517f40e41', 'response': '6629fae49393a05397450978507c4ef1', 'Authorization': 'Digest'}

來源

2009-08-28 21:40:19

哇，我愛Python。「授權：」實際上並不是標題字符串的一部分，所以我改爲：＃！/usr/bin/env python import re def mymain（）： reg = re.compile（'（\ w +）[=]？「？（\ w +）」？'） s =「」「摘要境界= 「fireworksproject.com」，用戶名= 「的Kristoffer」「」」打印STR（字典（reg.findall（S）））如果__name__ == '__main__'： mymain（）我沒有得到「摘要」協議聲明，但我不需要它。基本上3行代碼...輝煌！ – 2009-08-28 21:56:59

我認爲這會更明確地使用原始字符串或\\。 – 2009-08-28 22:04:05

如果你覺得這和使用它，一定要加'內的另一個問號「？（\ w +）」'所以它成爲'？「（\ w +）？」'這樣，如果你沿東西傳爲「」它返回參數並且該值未定義。如果你真的想摘要：'/（\ w +）（？：（？：「？（\ w +）」？[=]））？/'檢查，看是否'='在比賽中存在若然它的一個關鍵：價值，否則它是別的。 – Nijikokun 2013-04-03 00:06:47

如果這些組件將永遠在那裏，然後一個正則表達式會做的伎倆：

test = '''Authorization: Digest qop="chap", realm="[email protected]", username="Foobear", response="6629fae49393a05397450978507c4ef1", cnonce="5ccc069c403ebaf9f0171e9517f40e41"''' 

import re 

re_auth = re.compile(r""" 
    Authorization:\s*(?P<protocol>[^ ]+)\s+ 
    qop="(?P<qop>[^"]+)",\s+ 
    realm="(?P<realm>[^"]+)",\s+ 
    username="(?P<username>[^"]+)",\s+ 
    response="(?P<response>[^"]+)",\s+ 
    cnonce="(?P<cnonce>[^"]+)" 
    """, re.VERBOSE) 

m = re_auth.match(test) 
print m.groupdict()

生產：

{ 'username': 'Foobear', 
    'protocol': 'Digest', 
    'qop': 'chap', 
    'cnonce': '5ccc069c403ebaf9f0171e9517f40e41', 
    'realm': '[email protected]', 
    'response': '6629fae49393a05397450978507c4ef1' 
}

來源

2009-08-28 21:36:41

就我所能看到的情況而言，此解決方案可產生正確的結果。 – 2009-09-04 11:59:19

我會建議找到一個解析http頭的正確庫，不幸的是我無法重新加載任何。 :(

有一段時間檢查下面的代碼段（應該主要工作）：

input= """ 
Authorization: Digest qop="chap", 
    realm="[email protected]", 
    username="Foob,ear", 
    response="6629fae49393a05397450978507c4ef1", 
    cnonce="5ccc069c403ebaf9f0171e9517f40e41" 
""" 

field, sep, value = input.partition(":") 
if field.endswith('Authorization'): 
    protocol, sep, opts_str = value.strip().partition(" ") 

    opts = {} 
    for opt in opts_str.split(",\n"): 
     key, value = opt.strip().split('=') 
     key = key.strip(" ") 
     value = value.strip(' "') 
     opts[key] = value 

    opts['protocol'] = protocol 

    print opts

來源

2009-08-28 21:38:11

如果你的反應是在一個單一的字符串，該從來沒有變化，對其有表達式，有儘可能多的行比賽，你可以把它拆分成新行的數組稱爲authentication_array和使用正則表達式：

pattern_array = ['qop', 'realm', 'username', 'response', 'cnonce'] 
i = 0 
parsed_dict = {} 

for line in authentication_array: 
    pattern = "(" + pattern_array[i] + ")" + "=(\".*\")" # build a matching pattern 
    match = re.search(re.compile(pattern), line)   # make the match 
    if match: 
     parsed_dict[match.group(1)] = match.group(2) 
    i += 1

來源

2009-08-28 21:38:47 Pinochle

您使用PyParsing的原始概念將是最好的方法。隱含地要求的是需要語法的東西......也就是說，正則表達式或簡單的解析例程總是會變得脆弱，這聽起來像是你想要避免的東西。

看來，越來越pyparsing在谷歌應用程序引擎是很容易：How do I get PyParsing set up on the Google App Engine?

所以我與去，然後實現從RFC2617的完整的HTTP認證/授權頭的支持。

來源

2009-08-28 21:42:40

我決定採取這一做法，並試圖實現使用RFC規範Authorization頭一個完全兼容的解析器。這個任務顯得更加艱鉅比我anticpated。您的選擇簡單的正則表達式，而不是嚴格的正確性，可能是最好的務實的解決方案。我馬上彙報這裏，如果我最終得到一個全功能的首部解析器。 – 2009-08-29 16:27:01

是的，這將是很好看更嚴格的更正 – 2009-09-04 12:01:49

您也可以使用urllib2作爲CheryPy。

這裏的片段：

input= """ 
Authorization: Digest qop="chap", 
    realm="[email protected]", 
    username="Foobear", 
    response="6629fae49393a05397450978507c4ef1", 
    cnonce="5ccc069c403ebaf9f0171e9517f40e41" 
""" 
import urllib2 
field, sep, value = input.partition("Authorization: Digest ") 
if value: 
    items = urllib2.parse_http_list(value) 
    opts = urllib2.parse_keqv_list(items) 
    opts['protocol'] = 'Digest' 
    print opts

它輸出：

{'username': 'Foobear', 'protocol': 'Digest', 'qop': 'chap', 'cnonce': '5ccc069c403ebaf9f0171e9517f40e41', 'realm': '[email protected]', 'response': '6629fae49393a05397450978507c4ef1'}

來源

2009-08-28 22:11:31

這裏是我的pyparsing嘗試：

text = """Authorization: Digest qop="chap", 
    realm="[email protected]",  
    username="Foobear",  
    response="6629fae49393a05397450978507c4ef1",  
    cnonce="5ccc069c403ebaf9f0171e9517f40e41" """ 

from pyparsing import * 

AUTH = Keyword("Authorization") 
ident = Word(alphas,alphanums) 
EQ = Suppress("=") 
quotedString.setParseAction(removeQuotes) 

valueDict = Dict(delimitedList(Group(ident + EQ + quotedString))) 
authentry = AUTH + ":" + ident("protocol") + valueDict 

print authentry.parseString(text).dump()

它打印：

['Authorization', ':', 'Digest', ['qop', 'chap'], ['realm', '[email protected]'], 
['username', 'Foobear'], ['response', '6629fae49393a05397450978507c4ef1'], 
['cnonce', '5ccc069c403ebaf9f0171e9517f40e41']] 
- cnonce: 5ccc069c403ebaf9f0171e9517f40e41 
- protocol: Digest 
- qop: chap 
- realm: [email protected] 
- response: 6629fae49393a05397450978507c4ef1 
- username: Foobear

我不熟悉RFC，但我希望這能讓你滾動。

來源

2009-09-04 09:40:06 PaulMcG

這個解決方案是使用pypars這是我原本想的，據我所知，它會產生很好的結果。 – 2009-09-04 12:00:35

http摘要授權標頭字段是一個奇怪的野獸。它的格式與rfc 2616的Cache-Control和Content-Type頭域相似，但只是不相同而已。如果您仍然在尋找一個比正則表達式更智能，更可讀的庫，您可以嘗試使用str.split()來移除授權：摘要部分，然後使用從Werkzeug的http模塊解析其餘部分。（Werkzeug可以安裝在App Engine上。）

來源

2010-05-14 00:13:46

非常感謝。我可以用這個替換那個正則表達式。它似乎更強大。 – 2010-05-14 18:26:46

Nadia的正則表達式僅匹配參數值的字母數字字符。這意味着它不能解析至少兩個字段。也就是說，uri和qop。根據RFC 2617，uri字段是請求行中字符串的副本（即HTTP請求的第一行）。如果由於非字母數字「 - 」而導致值爲「auth-int」，則qop無法正確解析。

此修改後的正則表達式允許URI（或任何其他值）包含''（空格），''''（qoute）或'，'（逗號）之外的任何內容，這可能比它需要的更寬容，但不應該引起正確形成HTTP請求的任何問題

reg re.compile('(\w+)[:=] ?"?([^" ,]+)"?')

特別提示：。從那裏，這是相當簡單的示例代碼在RFC-2617轉換到Python使用Python的MD5 API，「MD5Init（）」變爲「m = md5.new（）」，「MD5Update（）」變爲「m.update（）」，並且「MD5Final（）」變爲「m.digest（）」。

來源

2011-09-13 15:09:58

解析HTTP請求授權頭與Python

回答

相關問題