如何知道HTTP頭部分何時結束？

服務器返回HTTP標頭和二進制文件;像這樣：如何知道HTTP頭部分何時結束？

HTTP/1.1 200 OK 
Date: Thu, 28 Jun 2012 22:11:14 GMT 
Server: Apache/2.2.3 (Red Hat) 
Set-Cookie: JSESSIONID=blabla; Path=/ 
Pragma: no-cache 
Cache-Control: must-revalidate, no-store 
Expires: Thu, 01 Jan 1970 00:00:00 GMT 
Content-disposition: inline; filename="foo.pdf" 
Content-Length: 6231119 
Connection: close 
Content-Type: application/pdf 

%PDF-1.6 
%âãÏÓ 
5989 0 obj 
<</Linearized 1/L 6231119/O 5992/E 371504/N 1498/T 6111290/H [ 55176 6052]>> 
endobj 

xref 
5989 2744 
0000000016 00000 n 
0000061228 00000 n 
0000061378 00000 n

我想只複製二進制文件。但是如何知道標題部分何時結束？我試過檢查一下這行是否包含\r\n\r\n，但看起來這個標準不適用於服務器響應，只適用於客戶端。這給出：

Content-disposition: inline; filename="foo.pdf" 
Content-Length: 6231119 
Connection: close 
Content-Type: application/pdf 

%PDF-1.6 
%âãÏÓ 
5989 0 obj 
<</Linearized 1/L 6231119/O 5992/E 371504/N 1498/T 6111290/H [ 55176 6052]>> 
endobj 

xref 
5989 2744 
0000000016 00000 n

這裏是C代碼：

while((readed = recv(sock, buffer, 128, 0)) > 0) { 

    if(isnheader == 0 && strstr(buffer, "\r\n\r\n") != NULL) 
     isnheader = 1; 

     if(isnheader) 
      fwrite(buffer, 1, readed, fp); 
}

UPDATE：

我把continue控制了我的if語句：

if(isnheader == 0 && strstr(buffer, "\r\n\r\n") != NULL) { 
    isnheader = 1; 
    continue; 
}

好，它按預期工作。但正如@Alnitak所提到的那樣，這並不安全。

來源

2012-06-28 Jack

您沒有正確解析輸入。這裏有幾件事你做錯了：

你的代碼似乎暗示你的緩衝區最多隻能包含一行標題數據。但是，recv（）不會在數據的「行」上運行，而會在二進制數據塊上運行。因此，如果您告訴它緩衝區長度爲128個字節，它將嘗試用128個字節的數據填充緩衝區（如果可用的話）（即使128個字節的數據包含多個「行」）。
你的代碼沒有考慮到頭部中斷的「\ r \ n」可能被兩個不同的recv（）調用拉到你的緩衝區，這會阻止你的代碼識別頭部中斷。
如果確實發現了標頭中斷（如果標頭的大小恰到好處，可能會發生這種情況），您將最終推出帶有終止符「\ r \ n」和標頭中斷（「\ r \ n \ n「）到你的二進制數據副本中。

我寫了一個快速的功能，應該找HTTP標頭的結尾，寫服務器的響應文件流的其餘部分：

void parse_http_headers(int s, FILE * fp) 
{ 
    int  isnheader; 
    ssize_t readed; 
    size_t len; 
    size_t offset; 
    size_t pos; 
    char  buffer[1024]; 
    char * eol; // end of line 
    char * bol; // beginning of line 

    isnheader = 0; 
    len  = 0; 

    // read next chunk from socket 
    while((readed = read(s, &buffer[len], (1023-len))) > 0) 
    { 
     // write rest of data to FILE stream 
     if (isnheader != 0) 
     fwrite(buffer, 1, readed, fp); 

     // process headers 
     if (isnheader == 0) 
     { 
     // calculate combined length of unprocessed data and new data 
     len += readed; 

     // NULL terminate buffer for string functions 
     buffer[len] = '\0'; 

     // checks if the header break happened to be the first line of the 
     // buffer 
     if (!(strncmp(buffer, "\r\n", 2))) 
     { 
      if (len > 2) 
       fwrite(buffer, 1, (len-2), fp); 
      continue; 
     }; 
     if (!(strncmp(buffer, "\n", 1))) 
     { 
      if (len > 1) 
       fwrite(buffer, 1, (len-1), fp); 
      continue; 
     }; 

     // process each line in buffer looking for header break 
     bol = buffer; 
     while((eol = index(bol, '\n')) != NULL) 
     { 
      // update bol based upon the value of eol 
      bol = eol + 1; 

      // test if end of headers has been reached 
      if ((!(strncmp(bol, "\r\n", 2))) || (!(strncmp(bol, "\n", 1)))) 
      { 
       // note that end of headers has been reached 
       isnheader = 1; 

       // update the value of bol to reflect the beginning of the line 
       // immediately after the headers 
       if (bol[0] != '\n') 
        bol += 1; 
       bol += 1; 

       // calculate the amount of data remaining in the buffer 
       len = len - (bol - buffer); 

       // write remaining data to FILE stream 
       if (len > 0) 
        fwrite(bol, 1, len, fp); 

       // reset length of left over data to zero and continue processing 
       // non-header information 
       len = 0; 
      }; 
     }; 

     if (isnheader == 0) 
     { 
      // shift data remaining in buffer to beginning of buffer 
      offset = (bol - buffer); 
      for(pos = 0; pos < offset; pos++) 
       buffer[pos] = buffer[offset + pos]; 

      // save amount of unprocessed data remaining in buffer 
      len = offset; 
     }; 
     }; 
    }; 

    return; 
}

我沒有測試的代碼，所以它可能有簡單的錯誤，但它應該指向正確的方向來解析C中緩衝區的字符串數據。

來源

2012-06-29 05:05:40

非常感謝您的回答。我試過了，但我保存了0個字節。 :( – Jack

@Jack我剛剛完成的測試在測試程序的例子，我做了一些調整，適當轉移緩衝數據，轉而使用read（）和指數（），並更新了代碼既要工作「 \ r \ n'和'\ n'行結束符 –

非常感謝。 – Jack

頭部和主體都應該由\r\n\r\n

（RFC 2616的4.1節）被分離但是一些服務器可能省略\r和只發送\n線，特別是如果他們不能消毒任何CGI供給頭以確保它們包括\r。

您還需要考慮如何分塊讀取 - 分隔符可能會跨越128字節的塊，這將阻止strstr調用的工作。

來源

2012-06-28 22:56:17 Alnitak

感謝您的回答！但我仍然不知道如何通過C代碼進行過濾。你讓我想起它，主要是因爲'strstr（）'不好檢查它和大小，我選擇了128，因爲它通常是一些文件中一行的長度。 – Jack

如何知道HTTP頭部分何時結束？

回答

相關問題