70 likes | 230 Views
Gnutella Crawler. Group 1 (Calvin Ching , Levi Stoddard, Kelvin Tsui). Overall System Design. master-slave architecture communication: master-slave: NIO master-master: blocking IO leader election: bully election algorithm based on version number and host name replication:
E N D
Gnutella Crawler Group 1 (Calvin Ching, Levi Stoddard, Kelvin Tsui)
Overall System Design • master-slave architecture • communication: • master-slave: NIO • master-master: blocking IO • leader election: • bully election algorithm • based on version number and host name • replication: • primary master replicates data to 5 backups • performs write to disk • every 10 seconds
Handling Master Node Failures • redirection message • sent from a backup master when slave tries to contact it for nodes to crawl or submit • occurs when slave fails to contact the primary master • a new election begins to ensure that a correct primary master is chosen • whenever a master node comes back online, a new election begins • a restarted node may become the primary master, or may become a backup master
Crawler Performance • 6 masters, 84 slaves (119 attempted) • 30 minutes (including deployment time) • master - < ~140MB memory usage • Number of Peers Crawled/Discovered: • 55905 crawled successfully • 851811 discovered but not crawled • 111082 other (failed) • 1018798 total • Rate of Crawling/Discovered (peers/second) • 31.06 crawled successfully • 473.22 discovered but not crawled • 565.99 overall
Issues We Encountered • At certain nodes, some specific ports are being used by other processes • eg. election port = 4444 at earth.cs.brown.edu • Amount of data that was sent from slaves to master had to be capped • insufficient buffer? • network bottlenecks?