1 / 38

Unix Network Programming A Case Study: Crawler Design

Unix Network Programming A Case Study: Crawler Design. Siyong Liang 11/24/2010. Review on TCP/IP Stack. Directory. Elementary Sockets I/O Models and Crawler Design. Elementary Sockets TCP Connection From a Programmer’s Perspective. A general TCP connection based on fork (1).

midori
Download Presentation

Unix Network Programming A Case Study: Crawler Design

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Unix Network ProgrammingA Case Study: Crawler Design Siyong Liang 11/24/2010

  2. Review on TCP/IP Stack

  3. Directory • Elementary Sockets • I/O Models and Crawler Design

  4. Elementary SocketsTCP ConnectionFrom a Programmer’s Perspective

  5. A general TCP connectionbased on fork (1)

  6. A general TCP connectionbased on fork (2)

  7. A general TCP connectionbased on fork (3)

  8. Map A Typical TCP Program with State Transition Diagram

  9. I/O Buffer

  10. TCP send buffer Not discard till ACKed

  11. UDP send buffer

  12. Client int main(int argc, char **argv) { int sockfd; struct sockaddr_in servaddr; if (argc != 2) err_quit("usage: tcpcli <IPaddress>"); sockfd = Socket(AF_INET, SOCK_STREAM, 0); bzero(&servaddr, sizeof(servaddr)); servaddr.sin_family = AF_INET; servaddr.sin_port = htons(SERV_PORT); inet_pton(AF_INET, argv[1], &servaddr.sin_addr); Connect(sockfd, (SA *) &servaddr, sizeof(servaddr)); str_cli(stdin, sockfd); /* do it all */ exit(0); }

  13. Clientcont. str_cli(FILE *fp, int sockfd){ char sendline[MAXLINE], recvline[MAXLINE]; while (Fgets(sendline, MAXLINE, fp) != NULL) { Writen(sockfd, sendline, strlen (sendline)); if (Readline(sockfd, recvline, MAXLINE) == 0) err_quit("str_cli: server terminated prematurely"); Fputs(recvline, stdout); } } Problem arises

  14. Client cont.(2) 2 void 3 str_echo(int sockfd) 4 { 5 ssize_t n; 6 char buf[MAXLINE]; 7 again: 8 while ( (n = read(sockfd, buf, MAXLINE)) > 0) 9 Writen(sockfd, buf, n); 10 if (n < 0 && errno == EINTR) 11 goto again; 12 else if (n < 0) 13 err_sys("str_echo: read error"); 14 }

  15. Goal of Crawler Design • Fully utilize the CPU time • Avoid being blocked • Concurrence • Handle multiple connections simultaneously

  16. Utilize CPU times • How to take the CPU time during • connect • Waiting to Read or Write

  17. I/O Models(1)

  18. I/O Models(2)

  19. I/O Models(3)

  20. I/O Models(4)

  21. I/O Models(4)

  22. Crossing

  23. I/O Multiplexing+Nonblocking int select ( • int maxfdp1, • fd_set *readset, • fd_set *writeset, • fd_set *exceptset, • const struct timeval * timeout )

  24. Ready to read • >= low-water mark • Read half closed(FIN received) • Number of completed connection nonzero • Listening socket • RST time problem • Error pending (errno)

  25. Ready to write • >= Low-water mark for send buffer • Socket connected • Socket requires no connection(UDP) • Write half closed( write->SIGPIPE) • Non-blocking connect completed or failed • Error pending

  26. Crawler Design—Avoid blocking • Initialize several connection at the same time • Connecting State • EINPROGRESS • writable • Read data • Parse and Grasp the next objective. • Initialize another connection • Can we concurrently do each step??

  27. Review on Select

  28. Asynchronous I/O

  29. Asynchronous I/O API

  30. AIO Structure struct aiocb { int aio_fildes; // File Descriptor int aio_lio_opcode; // Valid only for lio_listio (r/w/nop) volatile void *aio_buf; // Data Buffer size_t aio_nbytes; // Number of Bytes in Data Buffer struct sigevent aio_sigevent; // Notification Structure /* Internal fields */ ... };

  31. AIO notifications • Asynchronous notification with signals • Asynchronous notification with callbacks

  32. Crawler Design

  33. Thanks

More Related