380 likes | 553 Views
Unix Network Programming A Case Study: Crawler Design. Siyong Liang 11/24/2010. Review on TCP/IP Stack. Directory. Elementary Sockets I/O Models and Crawler Design. Elementary Sockets TCP Connection From a Programmer’s Perspective. A general TCP connection based on fork (1).
E N D
Unix Network ProgrammingA Case Study: Crawler Design Siyong Liang 11/24/2010
Directory • Elementary Sockets • I/O Models and Crawler Design
Elementary SocketsTCP ConnectionFrom a Programmer’s Perspective
TCP send buffer Not discard till ACKed
Client int main(int argc, char **argv) { int sockfd; struct sockaddr_in servaddr; if (argc != 2) err_quit("usage: tcpcli <IPaddress>"); sockfd = Socket(AF_INET, SOCK_STREAM, 0); bzero(&servaddr, sizeof(servaddr)); servaddr.sin_family = AF_INET; servaddr.sin_port = htons(SERV_PORT); inet_pton(AF_INET, argv[1], &servaddr.sin_addr); Connect(sockfd, (SA *) &servaddr, sizeof(servaddr)); str_cli(stdin, sockfd); /* do it all */ exit(0); }
Clientcont. str_cli(FILE *fp, int sockfd){ char sendline[MAXLINE], recvline[MAXLINE]; while (Fgets(sendline, MAXLINE, fp) != NULL) { Writen(sockfd, sendline, strlen (sendline)); if (Readline(sockfd, recvline, MAXLINE) == 0) err_quit("str_cli: server terminated prematurely"); Fputs(recvline, stdout); } } Problem arises
Client cont.(2) 2 void 3 str_echo(int sockfd) 4 { 5 ssize_t n; 6 char buf[MAXLINE]; 7 again: 8 while ( (n = read(sockfd, buf, MAXLINE)) > 0) 9 Writen(sockfd, buf, n); 10 if (n < 0 && errno == EINTR) 11 goto again; 12 else if (n < 0) 13 err_sys("str_echo: read error"); 14 }
Goal of Crawler Design • Fully utilize the CPU time • Avoid being blocked • Concurrence • Handle multiple connections simultaneously
Utilize CPU times • How to take the CPU time during • connect • Waiting to Read or Write
I/O Multiplexing+Nonblocking int select ( • int maxfdp1, • fd_set *readset, • fd_set *writeset, • fd_set *exceptset, • const struct timeval * timeout )
Ready to read • >= low-water mark • Read half closed(FIN received) • Number of completed connection nonzero • Listening socket • RST time problem • Error pending (errno)
Ready to write • >= Low-water mark for send buffer • Socket connected • Socket requires no connection(UDP) • Write half closed( write->SIGPIPE) • Non-blocking connect completed or failed • Error pending
Crawler Design—Avoid blocking • Initialize several connection at the same time • Connecting State • EINPROGRESS • writable • Read data • Parse and Grasp the next objective. • Initialize another connection • Can we concurrently do each step??
AIO Structure struct aiocb { int aio_fildes; // File Descriptor int aio_lio_opcode; // Valid only for lio_listio (r/w/nop) volatile void *aio_buf; // Data Buffer size_t aio_nbytes; // Number of Bytes in Data Buffer struct sigevent aio_sigevent; // Notification Structure /* Internal fields */ ... };
AIO notifications • Asynchronous notification with signals • Asynchronous notification with callbacks