我們的應用使用帶有連接和選擇操作(C代碼)非阻塞插座的使用。 pusedo代碼如下:TCP連接在高負載下隨機失敗
unsigned int ConnectToServer(struct sockaddr_in *pSelfAddr,struct sockaddr_in *pDestAddr)
{
int sktConnect = -1;
sktConnect = socket(AF_INET,SOCK_STREAM,0);
if(sktConnect == INVALID_SOCKET)
return -1;
fcntl(sktConnect,F_SETFL,fcntl(sktConnect,F_GETFL) | O_NONBLOCK);
if(pSelfAddr != 0)
{
if(bind(sktConnect,(const struct sockaddr*)(void *)pSelfAddr,sizeof(*pSelfAddr)) != 0)
{
closesocket(sktConnect);
return -1;
}
}
errno = 0;
int nRc = connect(sktConnect,(const struct sockaddr*)(void *)pDestAddr, sizeof(*pDestAddr));
if(nrC != -1)
{
return sktConnect;
}
if(errno != EINPROGRESS)
{
int savedError = errno;
closesocket(sktConnect);
return -1;
}
fd_set scanSet;
FD_ZERO(&scanSet);
FD_SET(sktConnect,&scanSet);
struct timeval waitTime;
waitTime.tv_sec = 2;
waitTime.tv_usec = 0;
int tmp;
tmp = select(sktConnect +1, (fd_set*)0, &scanSet, (fd_set*)0,&waitTime);
if(tmp == -1 || !FD_ISSET(sktConnect,&scanSet))
{
int savedErrorNo = errno;
writeLog("Connect %s failed after select, cause %d, error %s",inet_ntoa(pDestAddr->sin_addr),savedErrorNo,strerror(savedErrorNo));
closesocket(sktConnect);
return -1;
}
. . . . .}
有80個這樣的節點,並且應用程序以循環方式連接到它的所有節點。 在這個階段,一些節點都無法連接(API - 連接+選擇),錯誤號115
在下面的日誌(tcpdump的輸出)成功的情況下,我們可以看到 ( SYN,SYN + ACK,ACK),但沒有連SYN的條目存在於tcpdump的日誌失敗 節點。
tcpdump的日誌:
387937 2012-07-05 07:45:30.646514 10.18.92.173 10.137.165.136 TCP 33728 > 8441 [SYN] Seq=0 Ack=0 Win=5792 Len=0 MSS=1460 TSV=1414450402 TSER=912308224 WS=8
387947 2012-07-05 07:45:30.780762 10.137.165.136 10.18.92.173 TCP 8441 > 33728 [SYN, ACK] Seq=0 Ack=1 Win=5792 Len=0 MSS=1460 TSV=912309754 TSER=1414450402 WS=8
387948 2012-07-05 07:45:30.780773 10.18.92.173 10.137.165.136 TCP 33728 > 8441 [ACK] Seq=1 Ack=1 Win=5888 Len=0 TSV=1414450435 TSER=912309754
All the above three events indicate the success information.
387949 2012-07-05 07:45:30.782652 10.18.92.173 10.137.165.136 TCP 33728 > 8441 [PSH, ACK] Seq=1 Ack=1 Win=5888 Len=320 TSV=1414450436 TSER=912309754
387967 2012-07-05 07:45:30.915615 10.137.165.136 10.18.92.173 TCP 8441 > 33728 [ACK] Seq=1 Ack=321 Win=6912 Len=0 TSV=912309788 TSER=1414450436
388011 2012-07-05 07:45:31.362712 10.18.92.173 10.137.165.136 TCP 33728 > 8441 [PSH, ACK] Seq=321 Ack=1 Win=5888 Len=320 TSV=1414450581 TSER=912309788
388055 2012-07-05 07:45:31.495558 10.137.165.136 10.18.92.173 TCP 8441 > 33728 [ACK] Seq=1 Ack=641 Win=7936 Len=0 TSV=912309933 TSER=1414450581
388080 2012-07-05 07:45:31.702336 10.137.165.136 10.18.92.173 TCP 8441 > 33728 [PSH, ACK] Seq=1 Ack=641 Win=7936 Len=712 TSV=912309985 TSER=1414450581
388081 2012-07-05 07:45:31.702350 10.18.92.173 10.137.165.136 TCP 33728 > 8441 [ACK] Seq=641 Ack=713 Win=7424 Len=0 TSV=1414450666 TSER=912309985
388142 2012-07-05 07:45:32.185612 10.137.165.136 10.18.92.173 TCP 8441 > 33728 [PSH, ACK] Seq=713 Ack=641 Win=7936 Len=320 TSV=912310106 TSER=1414450666
388143 2012-07-05 07:45:32.185629 10.18.92.173 10.137.165.136 TCP 33728 > 8441 [ACK] Seq=641 Ack=1033 Win=8704 Len=0 TSV=1414450786 TSER=912310106
388169 2012-07-05 07:45:32.362622 10.18.92.173 10.137.165.136 TCP 33728 > 8441 [PSH, ACK] Seq=641 Ack=1033 Win=8704 Len=320 TSV=1414450831 TSER=912310106
388212 2012-07-05 07:45:32.494833 10.137.165.136 10.18.92.173 TCP 8441 > 33728 [ACK] Seq=1033 Ack=961 Win=9216 Len=0 TSV=912310183 TSER=1414450831
388219 2012-07-05 07:45:32.501613 10.137.165.136 10.18.92.173 TCP 8441 > 33728 [PSH, ACK] Seq=1033 Ack=961 Win=9216 Len=356 TSV=912310185 TSER=1414450831
388220 2012-07-05 07:45:32.501624 10.18.92.173 10.137.165.136 TCP 33728 > 8441 [ACK] Seq=961 Ack=1389 Win=10240 Len=0 TSV=1414450865 TSER=912310185
應用程序日誌上連接通知錯誤 - 對應於的tcpdump的前3個表
[5258: 2012-07-05 07:45:30]Connect [10.137.165.136 <- 10.18.92.173] success.
[5258: 2012-07-05 07:45:32]Connect 10.137.165.137 fail after select, cause:115, error Operation now in progress. Check whether remote machine exist and the network is normal or not.
[5258: 2012-07-05 07:45:32]Connect to server([10.137.165.137 <- 10.18.92.173], port=8441) Failed!
成功日誌(即連接API +選擇)。和故障日誌,其中有在tcpdump的任何事件
我的問題是:當客戶端發起「連接」 API失敗的情況下, 我不能夠看到tcpdump的任何事件在客戶端(甚至 初始SYN)。什麼可能是這種隨機性的原因。
當你退出'tcpdump'時,你看到'內核丟棄的數據包'或'接口丟棄的數據包'的非零值? – 2012-07-05 15:34:46