460 likes | 652 Views
U-P2P. A Peer-to-peer System for Description and Discovery of Resource-sharing Communities Aloke Mukherjee, Carleton University August 28, 2003. Peer-to-peer File-sharing. Exploit storage capability of the edge Balance load Robustness to failure Weaknesses: Search and Communities.
E N D
U-P2P A Peer-to-peer System for Description and Discovery of Resource-sharing Communities Aloke Mukherjee, Carleton University August 28, 2003
Peer-to-peer File-sharing • Exploit storage capability of the edge • Balance load • Robustness to failure • Weaknesses: Search and Communities
Search Problem • Lack of structured metadata • Filenames, Keyword matching • Opaque identifiers • Support for popular formats • Ignoring structured metadata • Implicit indicators • Collaborative filtering
Community Problem • Not simple to create a community for sharing a new file format • Current state • Different protocols/apps (gnutella, fasttrack, jxtasearch) • Inadequate metadata (filename matching, limited schemas) • Ad-hoc attempts aimed at specific domains • Scattered and isolated – there is no easy way to discover communities
Improving Search • Standard metadata layer • Explicit structured metadata • All resources are XML files • XML Schema used to describe format (e.g. MP3, design pattern)
Schema instantiates resource <schema> <element name=“designpattern”> <sequence> <element name=“name” type=“string”> <element name=“author” type=“string”> <element name=“context” type=“string”> <element name=“problem” type=“string”> <element name=“design” type=“string”> <element name=“diagram” type=“anyURI”> </sequence> </element> </schema> <designpattern> <name>singleton</name> <author>gang of four</author> <context>when creating a new class…</context> <problem>ensure a class only has…</problem> <design>make the class itself responsible…</design> <diagram>http://example.com/singleton.jpg</diagram> </designpattern>
xslt resource xml schema resource create form resource search form resource resource view Automated interface generation xslt instantiates xslt
resource xml schema xsl xsl instantiates resource create form resource search form xsl resource resource view
resource xml schema xsl xsl instantiates xsl resource create form resource search form resource resource view
Community Creation and Discovery:What is a Community? • Concrete object with defined tuple of attributes • Simplest form: (format, protocol, …) • Known examples: (mp3, napster) (video, kazaa) • Examples that don’t exist: (design patterns, gnutella) (p2p papers, jxtasearch) • Tuple is specified as a XML file
Simplifying Community Creation • User-designed communities • Compose schema to describe format • Compose community XML file <community> <name>designpatterns</name> <schema>designpattern.xsd</schema> <protocol>gnutella</protocol> <display>designpattern.stylesheet</display> </community>
mp3 community mp3 class mp3 mp3 Community as class
mp3 community mp3 class communitycommunity class class mp3 mp3 Metaclass analogy
mp3 community community community mp3 community Community discovery is File discovery • MP3 community shares MP3 files • Community community shares communities
Simplifying Community Discovery • A Community for Communities: The Root Community • Communities are files shared in a real community • Root Community includes schema for communities (format, protocol) = (community, centralized db)
Schema for Communities <schema> <element name="community"> <complexType> <sequence> <element name="name" type="xsd:string"/> <element name="protocol" type="protocolTypes"/> <element name="schema" type="xsd:anyURI"/> <element name="display" type="xsd:anyURI"/> </sequence> </complexType> </element> </schema> <community> <name>root community</name> <schema>community.xsd</schema> <protocol>central-db</protocol> <display>community.stylesheet</display> </community> The Root Community
What is U-P2P? • A framework that breathes life into these ideas • Explicit metadata search and creation for every Community • Creation of Community tuples • (format, protocol etc…) • Discovery of Community tuples
Technologies • Java • Tomcat Servlet Container • Java Server Pages (JSP) + Servlets • XSLT (transforms), XPath (queries) • Java components for XSLT, XPath (Xerces, Xalan) • eXist XML Database • Log4j (logging infrastructure), JUnit (unit testing)
Evaluation and Validation: Areas of Interest • Publish and Search times as Community size increases • Breaking down Publish and Search operations • Community effect • Multiple central servers
Contributions • Standard Metadata Layer • All communities include support for explicit metadata search and creation • User-designed Communities • Users can easily share new formats with full support for metadata • Community for Communities • Prevents fragmented, isolated communities by providing metadata about communities and a standard method for discovering them • Performance and Scalability Gains • Communities can improve performance and scalability vs. systems where resources are undifferentiated
Future Work • Performance improvements • Protocol independence (adapters for Gnutella, Freenet, etc.) • Community-aware Gnutella routing • More Community parameters (security, authentication, etc.)
Future Work continued • Trust metrics (to differentiate between communities, metadata quality) • Community evolution • Inheritance and multiple inheritance for Communities
U-P2P Publications A. Mukherjee, B. Esfandiari, N. Arthorne, “U-P2P: A Peer-to-peer System for Description and Discovery of Resource-sharing Communities”, ICDCS Workshops 2002: 701-705, July 2002. Neal Arthorne, Babak Esfandiari and Aloke Mukherjee, "U-P2P: A Peer-to-peer Framework for Universal Resource Sharing and Discovery”, Proceedings of Freenix track of Usenix 2003, 29-38, June 2003. http://u-p2p.sourceforge.net
Repository Design: XML Database • Requirements • Flexibility to store wide variety of formats • Handle powerful queries over all metadata • XML Database better suited than RDBMS • Difficult to map fields to rows and columns • Chose eXist XML database • Open source • Written in Java • Support for XML:DB API
Network Adapter Design • Abstract interface to Peer-to-peer Network • Routing search requests, handling results, handle incoming search requests, etc. • Only implemented Hybrid model (Napster model) • All peers can act as client and/or server
Evaluation and Validation: Challenges • Finding large XML collections • Berkeley Drosophila Genome Project: genome annotations • Other sources: DBLP (CS papers), EDGAR (SEC filings), GeneOntology (gene-related concepts) • Transforming DTDs to XML Schema (DTDXS package) • Automation • XML-RPC interface for publish and search