2013-10-23 28 views
0

我在使用C++解析一些HTTP頭時遇到了問題。現在我希望能夠找到結束每個HTTP標題條目的回車/換行符組合。我與str.find這樣做()像這樣:尋找回車/換行與c + +的組合

string hdr; //filled with the header data 
int line_end_pos = hdr.find("\r\n"); //also tried "\\r\\n", same results 

儘管知道頭有一個回車和換行符的組合,找到()保留返回-1。我在這裏錯過了什麼?

編輯:

庫我使用提供了顯示數據的幾個不同的功能。報頭數據的樣品看起來像這樣以字符串格式:

GET /p/libcrafter/ HTTP/1.1 
Host: code.google.com 
User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:24.0) Gecko/20100101 Firefox/24.0 
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 
Accept-Language: en,en-us;q=0.5 
Accept-Encoding: gzip, deflate 
DNT: 1 
Cookie: PREF=ID=ad8fd3ab4b0bd3c9:U=e1bd88556eeb2dce:FF=0:TM=1382531357:LM=1382531841:S=Pbh-JiokGeVbsSh-; NID=67=olK2k5sUZ95mRApV77s7CfXscytJSfmVuyubiSCMotOdBBvijqrTwyyifLQZbZA_SCTVQXqTEoE6hqaqVJkRpqoY2RPDFBPghbe5czX6QxKw7lBdOaP6-IpzGXYMWl6Q; OGPC=4061029-5:; __utma=247248150.2068354019.1382532826.1382532826.1382532826.1; __utmb=247248150.10.10.1382532826; __utmc=247248150; __utmz=247248150.1382532826.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none) 
Connection: keep-alive 
Cache-Control: max-age=0 

它看起來像這樣在「十六進制轉儲」格式:

47455420 2F702F6C 69626372 61667465 GET /p/libcrafte 00000000 
    722F2048 5454502F 312E310D 0A486F73 r/ HTTP/1.1..Hos 00000010 
    743A2063 6F64652E 676F6F67 6C652E63 t: code.google.c 00000020 
    6F6D0D0A 55736572 2D416765 6E743A20 om..User-Agent: 00000030 
    4D6F7A69 6C6C612F 352E3020 28583131 Mozilla/5.0 (X11 00000040 
    3B205562 756E7475 3B204C69 6E757820 ; Ubuntu; Linux 00000050 
    7838365F 36343B20 72763A32 342E3029 x86_64; rv:24.0) 00000060 
    20476563 6B6F2F32 30313030 31303120 Gecko/20100101 00000070 
    46697265 666F782F 32342E30 0D0A4163 Firefox/24.0..Ac 00000080 
    63657074 3A207465 78742F68 746D6C2C cept: text/html, 00000090 
    6170706C 69636174 696F6E2F 7868746D application/xhtm 000000A0 
    6C2B786D 6C2C6170 706C6963 6174696F l+xml,applicatio 000000B0 
    6E2F786D 6C3B713D 302E392C 2A2F2A3B n/xml;q=0.9,*/*; 000000C0 
    713D302E 380D0A41 63636570 742D4C61 q=0.8..Accept-La 000000D0 
    6E677561 67653A20 656E2C65 6E2D7573 nguage: en,en-us 000000E0 
    3B713D30 2E350D0A 41636365 70742D45 ;q=0.5..Accept-E 000000F0 
    6E636F64 696E673A 20677A69 702C2064 ncoding: gzip, d 00000100 
    65666C61 74650D0A 444E543A 20310D0A eflate..DNT: 1.. 00000110 
    436F6F6B 69653A20 50524546 3D49443D Cookie: PREF=ID= 00000120 
    61643866 64336162 34623062 64336339 ad8fd3ab4b0bd3c9 00000130 
    3A553D65 31626438 38353536 65656232 :U=e1bd88556eeb2 00000140 
    6463653A 46463D30 3A544D3D 31333832 dce:FF=0:TM=1382 00000150 
    35333133 35373A4C 4D3D3133 38323533 531357:LM=138253 00000160 
    31383431 3A533D50 62682D4A 696F6B47 1841:S=Pbh-JiokG 00000170 
    65566273 53682D3B 204E4944 3D36373D eVbsSh-; NID=67= 00000180 
    6F6C4B32 6B357355 5A39356D 52417056 olK2k5sUZ95mRApV 00000190 
    37377337 43665873 6379744A 53666D56 77s7CfXscytJSfmV 000001A0 
    75797562 6953434D 6F744F64 42427669 uyubiSCMotOdBBvi 000001B0 
    6A717254 77797969 664C515A 625A415F jqrTwyyifLQZbZA_ 000001C0 
    53435456 51587154 456F4536 68716171 SCTVQXqTEoE6hqaq 000001D0 
    564A6B52 70716F59 32525044 46425067 VJkRpqoY2RPDFBPg 000001E0 
    68626535 637A5836 51784B77 376C4264 hbe5czX6QxKw7lBd 000001F0 
    4F615036 2D49707A 4758594D 576C3651 OaP6-IpzGXYMWl6Q 00000200 
    3B204F47 50433D34 30363130 32392D35 ; OGPC=4061029-5 00000210 
    3A3B205F 5F75746D 613D3234 37323438 :; __utma=247248 00000220 
    3135302E 32303638 33353430 31392E31 150.2068354019.1 00000230 
    33383235 33323832 362E3133 38323533 382532826.138253 00000240 
    32383236 2E313338 32353332 3832362E 2826.1382532826. 00000250 
    313B205F 5F75746D 623D3234 37323438 1; __utmb=247248 00000260 
    3135302E 31302E31 302E3133 38323533 150.10.10.138253 00000270 
    32383236 3B205F5F 75746D63 3D323437 2826; __utmc=247 00000280 
    32343831 35303B20 5F5F7574 6D7A3D32 248150; __utmz=2 00000290 
    34373234 38313530 2E313338 32353332 47248150.1382532 000002A0 
    3832362E 312E312E 75746D63 73723D28 826.1.1.utmcsr=(000002B0 
    64697265 6374297C 75746D63 636E3D28 direct)|utmccn=(000002C0 
    64697265 6374297C 75746D63 6D643D28 direct)|utmcmd=(000002D0 
    6E6F6E65 290D0A43 6F6E6E65 6374696F none)..Connectio 000002E0 
    6E3A206B 6565702D 616C6976 650D0A43 n: keep-alive..C 000002F0 
    61636865 2D436F6E 74726F6C 3A206D61 ache-Control: ma 00000300 
    782D6167 653D300D 0A0D0A    x-age=0....  00000310 

最後,它看起來像這樣作爲一個「原始字符串」 :

\x47\x45\x54\x20\x2f\x70\x2f\x6c\x69\x62\x63\x72\x61\x66\x74\x65\x72\x2f\x20\x48 
\x54\x54\x50\x2f\x31\x2e\x31\xd\xa\x48\x6f\x73\x74\x3a\x20\x63\x6f\x64\x65\x2e\x67 
\x6f\x6f\x67\x6c\x65\x2e\x63\x6f\x6d\xd\xa\x55\x73\x65\x72\x2d\x41\x67\x65\x6e\x74 
\x3a\x20\x4d\x6f\x7a\x69\x6c\x6c\x61\x2f\x35\x2e\x30\x20\x28\x58\x31\x31\x3b\x20\x55 
\x62\x75\x6e\x74\x75\x3b\x20\x4c\x69\x6e\x75\x78\x20\x78\x38\x36\x5f\x36\x34\x3b\x20 
\x72\x76\x3a\x32\x34\x2e\x30\x29\x20\x47\x65\x63\x6b\x6f\x2f\x32\x30\x31\x30\x30\x31 
\x30\x31\x20\x46\x69\x72\x65\x66\x6f\x78\x2f\x32\x34\x2e\x30\xd\xa\x41\x63\x63\x65\x70 
\x74\x3a\x20\x74\x65\x78\x74\x2f\x68\x74\x6d\x6c\x2c\x61\x70\x70\x6c\x69\x63\x61\x74 
\x69\x6f\x6e\x2f\x78\x68\x74\x6d\x6c\x2b\x78\x6d\x6c\x2c\x61\x70\x70\x6c\x69\x63\x61 
\x74\x69\x6f\x6e\x2f\x78\x6d\x6c\x3b\x71\x3d\x30\x2e\x39\x2c\x2a\x2f\x2a\x3b\x71\x3d 
\x30\x2e\x38\xd\xa\x41\x63\x63\x65\x70\x74\x2d\x4c\x61\x6e\x67\x75\x61\x67\x65\x3a\x20 
\x65\x6e\x2c\x65\x6e\x2d\x75\x73\x3b\x71\x3d\x30\x2e\x35\xd\xa\x41\x63\x63\x65\x70\x74 
\x2d\x45\x6e\x63\x6f\x64\x69\x6e\x67\x3a\x20\x67\x7a\x69\x70\x2c\x20\x64\x65\x66\x6c\x61 
\x74\x65\xd\xa\x44\x4e\x54\x3a\x20\x31\xd\xa\x43\x6f\x6f\x6b\x69\x65\x3a\x20\x50\x52 
\x45\x46\x3d\x49\x44\x3d\x61\x64\x38\x66\x64\x33\x61\x62\x34\x62\x30\x62\x64\x33\x63 
\x39\x3a\x55\x3d\x65\x31\x62\x64\x38\x38\x35\x35\x36\x65\x65\x62\x32\x64\x63\x65\x3a 
\x46\x46\x3d\x30\x3a\x54\x4d\x3d\x31\x33\x38\x32\x35\x33\x31\x33\x35\x37\x3a\x4c\x4d 
\x3d\x31\x33\x38\x32\x35\x33\x31\x38\x34\x31\x3a\x53\x3d\x50\x62\x68\x2d\x4a\x69\x6f 
\x6b\x47\x65\x56\x62\x73\x53\x68\x2d\x3b\x20\x4e\x49\x44\x3d\x36\x37\x3d\x6f\x6c\x4b 
\x32\x6b\x35\x73\x55\x5a\x39\x35\x6d\x52\x41\x70\x56\x37\x37\x73\x37\x43\x66\x58\x73 
\x63\x79\x74\x4a\x53\x66\x6d\x56\x75\x79\x75\x62\x69\x53\x43\x4d\x6f\x74\x4f\x64\x42 
\x42\x76\x69\x6a\x71\x72\x54\x77\x79\x79\x69\x66\x4c\x51\x5a\x62\x5a\x41\x5f\x53\x43 
\x54\x56\x51\x58\x71\x54\x45\x6f\x45\x36\x68\x71\x61\x71\x56\x4a\x6b\x52\x70\x71\x6f 
\x59\x32\x52\x50\x44\x46\x42\x50\x67\x68\x62\x65\x35\x63\x7a\x58\x36\x51\x78\x4b\x77 
\x37\x6c\x42\x64\x4f\x61\x50\x36\x2d\x49\x70\x7a\x47\x58\x59\x4d\x57\x6c\x36\x51\x3b 
\x20\x4f\x47\x50\x43\x3d\x34\x30\x36\x31\x30\x32\x39\x2d\x35\x3a\x3b\x20\x5f\x5f\x75 
\x74\x6d\x61\x3d\x32\x34\x37\x32\x34\x38\x31\x35\x30\x2e\x32\x30\x36\x38\x33\x35\x34 
\x30\x31\x39\x2e\x31\x33\x38\x32\x35\x33\x32\x38\x32\x36\x2e\x31\x33\x38\x32\x35\x33 
\x32\x38\x32\x36\x2e\x31\x33\x38\x32\x35\x33\x32\x38\x32\x36\x2e\x31\x3b\x20\x5f\x5f 
\x75\x74\x6d\x62\x3d\x32\x34\x37\x32\x34\x38\x31\x35\x30\x2e\x31\x30\x2e\x31\x30\x2e 
\x31\x33\x38\x32\x35\x33\x32\x38\x32\x36\x3b\x20\x5f\x5f\x75\x74\x6d\x63\x3d\x32\x34 
\x37\x32\x34\x38\x31\x35\x30\x3b\x20\x5f\x5f\x75\x74\x6d\x7a\x3d\x32\x34\x37\x32\x34 
\x38\x31\x35\x30\x2e\x31\x33\x38\x32\x35\x33\x32\x38\x32\x36\x2e\x31\x2e\x31\x2e\x75 
\x74\x6d\x63\x73\x72\x3d\x28\x64\x69\x72\x65\x63\x74\x29\x7c\x75\x74\x6d\x63\x63\x6e 
\x3d\x28\x64\x69\x72\x65\x63\x74\x29\x7c\x75\x74\x6d\x63\x6d\x64\x3d\x28\x6e\x6f\x6e 
\x65\x29\xd\xa\x43\x6f\x6e\x6e\x65\x63\x74\x69\x6f\x6e\x3a\x20\x6b\x65\x65\x70\x2d\x61 
\x6c\x69\x76\x65\xd\xa\x43\x61\x63\x68\x65\x2d\x43\x6f\x6e\x74\x72\x6f\x6c\x3a\x20\x6d 
\x61\x78\x2d\x61\x67\x65\x3d\x30\xd\xa\xd\xa 

正如你所看到的,十六進制格式輸出,當行結束與0D 0A和當原始字符串格式與它們\ XD和\ XA結束。我的問題仍然存在,我怎樣才能找到這些行尾字符作爲字符串(或不能我)處理數據?

+0

看起來是正確的,你有沒有驗證(在調試器或日誌),該字符串不包含\ r \ n – Dweeberly

+0

不應該有前\ 0。我嘗試添加它只是爲了查看調用hdr.find(「\ 0 \ r \ n」),但仍然得到了相同的意外結果。 – amoeba

+0

你是如何初始化hdr的?你是否使用了一個將回車換行映射到換行符的輸入法?還要注意,'\ r \ n'不是完全可移植的,儘管它可能適用於大多數實現。請參閱http://stackoverflow.com/questions/1279779/what-is-the-difference-between-r-and-n/9549183#9549183 –

回答

0

以下程序的輸出是35

#include <iostream> 
using namespace std; 

int main() 
{ 
    string hdr = "Date: Wed, 23 Oct 2013 02:20:30 GMT\r\nServer: Apache\r\n"; 
    int line_end_pos = hdr.find("\r\n"); 
    cout << line_end_pos; 
} 

如果我們再修改這個代碼,使得它現在是:

#include <iostream> 
#include <fstream> 
using namespace std; 

int main() 
{ 
    string hdr = "Date: Wed, 23 Oct 2013 02:20:30 GMT\r\nServer: Apache\r\n"; 

    int line_end_pos = hdr.find("\r\n"); 
    cout << line_end_pos; 

    fstream output; 
    output.open("test.txt", std::fstream::out); 

    output << hdr; 
    output.close(); 
} 

我們得到與HDR的內容的文件。在用十六進制編輯器查看它時,可以看到發生了一些輸入變換。在GMTServer之間,我們期望看到兩個字符 - 0x0D和0x0A。但是,我們看到test.txt實際上有3個字符 - 0x0D,0x0D,0x0A。當輸入字符串長度爲53個字節(字符)時,該文件的長度也是55個字節(字符)。

如果我們按位或標誌std::fstream::binarystd::fstream::out

output.open("test.txt", std::fstream::out | std::fstream::binary);

然後輸出在hdr保持的字符串的相同副本。即長53字節,單行0x0d, 0x0a

編輯:另外,值得指出的是,基於UNIX和Windows的系統具有不同的行結束約定。我在windows下編寫了這段代碼。

Sooooo,我建議你保存一份頭文件並用十六進制編輯器檢查它 - 直到你這樣做,或者使用一個調試器,你就不會知道問題是什麼。我通常會發現將文本輸入視爲二進制輸入是最安全的 - 因爲沒有末尾字符的翻譯。

編輯2:當你運行這個時,你會得到26的結果嗎?如果是這樣,恐怕我現在已經沒有想法了。當我在清晨,我會進一步考慮你的問題。

#include <iostream> 

using namespace std; 

int main() 
{ 
    char rawData[] = 
    { 
     0x47,0x45,0x54,0x20, 0x2F,0x70,0x2F,0x6C, 0x69,0x62,0x63,0x72, 0x61,0x66,0x74,0x65, 
     0x72,0x2F,0x20,0x48, 0x54,0x54,0x50,0x2F, 0x31,0x2E,0x31,0x0D, 0x0A,0x48,0x6F,0x73, 
     0x74,0x3A,0x20,0x63, 0x6F,0x64,0x65,0x2E, 0x67,0x6F,0x6F,0x67, 0x6C,0x65,0x2E,0x63 
    }; 
    string hdr = rawData; 
    int newLinePos = hdr.find("\r\n"); 
    cout << newLinePos; 
} 
+0

我在這裏查看了十六進制文件,每行都以0a和0d的組合結束(我正在使用Linux機器)。 – amoeba

+0

看到我上面編輯過的帖子。 – amoeba

+0

@amoeba - 感謝您提供額外的數據,希望它能使診斷更容易。我仍然錯過了它(問題),就像你一樣。我已根據您的數據添加了一個新的代碼段。 (我懷疑這不會幫助,說實話) – enhzflep