230 likes | 234 Views
Application Technology Workshop : P2P and GRID: 28/01/2004. P2P Simulation and Reality. Sam Joseph Strategic Software Division Graduate School of Information Science and Technology, University of Tokyo. Sam Joseph Laboratory for Interactive Learning Technology (LILT), Department of
E N D
Application Technology Workshop : P2P and GRID: 28/01/2004 P2P Simulation and Reality Sam Joseph Strategic Software Division Graduate School of Information Science and Technology, University of Tokyo Sam Joseph Laboratory for Interactive Learning Technology (LILT), Department of Information and Computer Sciences, University of Hawai'i at Manoa
Personal Profile • Founder of NeuroGrid project: http://www.neurogrid.net • Sub-editor on the P2PJournal: http://www.p2pjournal.com • MetaData subgroup leader for P2P research groups: http://www.irtf.org/charters/p2prg.html
Talk Contents • What is a Simulation? • Why Simulate P2P? • Simulation Methodology • P2P Simulation Issues • The Dangers of Simulation • Real P2P Systems • Types of Simulator • An Extendable Simulator
What is a simulation? • A simulation is “an attempt to model a system in order to study it scientifically” (Law & Kelton, 2000) • Real world complexity often prevent directs mathematical analysis of model • Thus a numerical approach or simulation is required • This will require an abstraction of the real system, since otherwise we would just be building the real system • Central question is which abstractions to make, as one can accidentally abstract away essential details • For example is peer heterogeneity required in simulation?
Why Simulate? • Testing scalability to large numbers of peers, requires … large numbers of peers • Thus one motivation to simulate comes from the expense of running the real system • Testing solutions to malicious peers requires … malicious peers • And introducing malicious peers into a real system is somewhat socially irresponsible • However crucial question is are simulation studies relevant to real p2p systems?
Simulation Methodology • All too often simulation "studies" involve building a model and using the results of a single run to obtain the "answer". (Law & Kelton, 2000) • This pattern is replicated across P2P simulation studies • Drawing valid and credible conclusions requires: • Careful assessment of assumptions • Appropriate probability distributions of starting parameters • Subjecting results to the appropriate statistical analysis
P2P Simulation Issues 1 • Content Model • 1. Representational complexity • document is represented by hash X • document is in category X and no other • document is related to “whales” and “dolphins” • document “defines” “whales” and "has illustrations of” “dolphins”. • 2. Vocabulary • whether users map fundamental concepts onto the same terms, e.g. I say “whale” and you say “kujira”, but we both mean marine mammal • 3. Fundamental concepts • agreement about fundamental concepts; you say this marine mammal is food and I say it is sentient • content-centric or user-centric? • 4. Dishonesty • e.g. you say this is a “revolutionary product” and I say this is “unsolicited junk”
P2P Simulation Issues 2 • Content Model • Each content issue subject to dynamic evolutionary processes where users change opinions and strategies over time • More on content modeling in P2P networks in Joseph & Hoshiai (2003) • Network state serialization • Allows stopping and starting • Danger of biasing statistical analysis • Network Markup Language (NML) • Visualization, unit-testing • Visualization greatly aids debugging • Unit-testing particularly important in extendible framework
P2P Simulation Issues 3 • Parameter Distributions • Starting topology, content & query distributions and churn rates • Determine from real system where available • Lv et al.(2002) showed different macro-behaviour depending on whether topology was constructed using a Zipfian model or using real world data • Results Analysis • run multiple simulations starting with different selections from the same input probability distributions • present results indicating confidence intervals • Or repeat assessment of confidence intervals, after sets of additional simulations, until the specified precision is acquired
Dangers of Simulation • Case study: Query Message Combination Protocol (QMCP) • QMCP is a Gnutella Protocol modification to combine multiple queries, that could lead to more efficient use of bandwidth (based on 2001 study) • However network protocols are frequently changing – do older results about the Gnet still apply? • Failing to consider lower network levels may leave you suggesting redundant things • e.g. replicating a Nagle Algorithm in the overlay when it already exists in TCP/IP
Real P2P Systems • Saroiu et al (2001) Gnutella/Napster study: • Significant heterogeneity: bandwidth, latency, availability vary between 3-5 orders magnitude • Peers deliberately misreport information if there is an incentive to do so • Clip2 showed Gnet follows a power law – Saroiu et al show resistance to random failure, but fragments under directed attack • Ripeanu et al, 2002 show Gnutella diverging from a power law network • Ge et al (2002) unregulated and transitory nature of p2p systems makes it difficult to evaluate assumptions in real system
Types of Simulator • Hierarchy of approaches • Numerical Model • SimP2 (Kant & Iyer, 2003) • Queuing Model (Ge et al., 2002?) • Flow-based simulation • Narses (Baker & Giuli, 2002) • Event-based simulation • NeuroGrid (Joseph, 2003) • QueryCycle (Schlosser et al., 2002) • Packet-based simulation • PLP2P (He et al., 2003) • NS-2 • Real system
NeuroGrid Simulator • Abstract Classes • Keyword • Document • Message • Node • Network • MessageHandler • By extending the above classes allows us to create different p2p networks • Gnutella • Freenet • NeuroGrid • Pastry • Action Event framework
Action Action Action Action Action 0 1 2 3 4 5 6 7 8 9 Execution causes two actions to be inserted at timestep 3 Action 0 1 2 3 4 5 6 7 8 9 Execution causes one actions to be inserted at timestep 4, another at timestep 8 Action Execution causes two more actions to be inserted at timestep 8 0 1 2 3 4 5 6 7 8 9 Action Action Action Action
Conclusion • P2P systems are characterized by many of the annoying real life complexities that prevent simple analysis and simulation • For example • high turnover of peers • download & connection failures • large numbers of stochastically behaving peers • Simplifications used for tractable simulations can lead to unrealistic behaviour • Effective use of simulation studies requires a lot of work,but not as much as full implementation?
Questions? sam@neurogrid.com
Gnutella Search • Gnutella uses broadcast search • The spread of the messages is limited by TTL and GUID N002 TTL=0 N004 GUID G044 G023 G047 Stop TTL=2 GUID G037 G048 G045 N001 Query-G067 TTL=1 N003 GUID G084 G023 G045 N005 TTL=2 GUID G084 G032 G099 Match TTL=3 LOOP TTL=1 GUID G099 G023 G045 • TTL: Time To Live - the number of hops before a message is expired • GUID: Globally Unique Identifier - allows nodes to identify loops TTL=1 TTL=2 TTL=1 N006 Seen it Seen it GUID G084 G067 G045 GUID G084 G067 G045 N007
Abstract Class Extension Keyword ID Hashtable Document ID Hashtable Message ID Hashtable Node ID Hashtable • Extending the abstract classes implements p2p functions Keyword Document Message Node SimpleKeyword SimpleDocument SimpleMessage SimpleNode • E.g. the Message abstract class contains Document and Keyword array variables • SimpleMessage implements a second constructor, which is used when nodes forward messages public SimpleMessage(Message p_message) throws Exception { if(p_message == null) throw new Exception("Message is null"); o_message_ID = p_message.getMessageID(); o_TTL = p_message.getTTL() - 1; o_keywords = p_message.getKeywords(); o_document = p_message.getDocument(); etc … } GUID TTLdecrement
processMessage() protected Hashtable o_seenGUIDs = new Hashtable(10); • The Node abstract class has an abstract processMessage method • The Node abstract class has a GUID Hashtable public abstract void processMessage(Message p_message, boolean p_start) throws Exception; • SimpleNode implements this method public void processMessage(Message p_message, boolean p_start) throws Exception { if(p_message == null) throw new Exception(“p_message is null"); String x_previous = (String)(o_seenGUIDs.get(p_message.getMessageID())); if(x_previous != null) return; o_seenGUIDs.put(p_message.getMessageID(),p_message.getMessageID()); etc … Message ID of incoming Message goes into Hashtable Message seen?
Forwarding Messages • When a message is forwarded, a new message object is created through the SimpleMessage constructor which ensures the GUID is maintained and the TTL decremented • Also in the SimpleNode processMessage implementation: Enumeration x_enum = o_conn_list.elements(); while(x_enum.hasMoreElements()) { x_temp_node = (Node)(x_enum.nextElement()); x_new_message = new SimpleMessage(p_message); x_new_message.setPreviousLocation(this); o_sending_message.put(x_temp_node,x_temp_node); x_temp_node.addMessageToInbox(x_new_message); } etc … Create new message Forward to the next node
NeuroGrid Search • NeuroGrid nodes learn data location and forward accordingly • Human networking analogy N002 TTL=0 N004 GUID G044 G023 G047 KB A – NXX B – NXX C – NXX Stop TTL=2 GUID G037 G048 G045 KB A – NXX B – NXX C – NXX N001 Query-G067 TTL=1 N003 GUID G084 G023 G045 N005 TTL=2 KB A – N003 B – N002 C – N003 GUID G084 G032 G099 Match TTL=3 KB A – N004 B – N005 C – N005 TTL=1 GUID G099 G023 G045 KB A – NXX B – NXX C – NXX N006 GUID G084 G067 G045 N007 KB A – NXX B – NXX C – NXX GUID G084 G067 G045 KB A – NXX B – NXX C – NXX
NeuroGrid Nodes // MultiHashtable used to store which documents are in this node (key = keyword) protected MultiHashtable o_contents = new MultiHashtable(); // MultiHashtable used to store information about documents in other nodes (key = keyword) protected MultiHashtable o_knowledge = new MultiHashtable(); • NeuroGrid nodes have MultiHashtables that associate a single key with a Vector of objects • A successful search and processMessage updates the knowledge base of the node that generated the query for(int i=0;i<x_keywords.length;i++) { x_docs = (Vector)(o_contents.get(x_keywords[i])); if(x_docs != null) { if(x_docs.contains(p_message.getDocument())) { if(Network.o_learning == true) { x_start_node = p_message.getStart(); x_start_node.addConnection(this); x_start_node.addKnowledge(this,x_keywords); } break; // stop checking once we find a node The keywords in the incoming message Document with that keyword present? Update original Node KB
Freenet Search = match • Freenet aggressively caches data while performing a serial search • Routing uses document hashes N002 N004 GUID G044 G023 G047 KB K002 – NXXX K003 – NXXX K004 – NXXX Seen it TTL=19 GUID G067 G048 G045 KB K002 – NXXX K003 – NXXX K004 – NXXX TTL=14 TTL=12 TTL=11 N001 Query-G067 N003 GUID G084 G023 G045 N005 TTL=13 KB K002 – N002 K003 – N003 K004 – N007 GUID G084 G032 G099 Match TTL=20 KB K002 – N004 K003 – N005 K004 – N006 TTL=10 GUID G099 G023 G045 KB K002 – N002 K003 – N003 K004 – N007 N006 GUID G084 G067 G045 N007 KB K002 – NXXX K003 – NXXX K004 – NXXX GUID G084 G067 G045 KB K002 – NXXX K003 – NXXX K004 – NXXX