170 likes | 257 Views
Incorporating an XML Matching Engine into Distributed Brokering Systems. Shrideep Pallickara, Geoffrey Fox and Marlon Pierce spallick, gcf@indiana.edu Community Grid Computing Laboratory, Pervasive Technology Labs Indiana University. http://www.naradabrokering.org. Talk Outline. Motivation
E N D
Incorporating an XML Matching Engine into Distributed Brokering Systems Shrideep Pallickara, Geoffrey Fox and Marlon Pierce spallick, gcf@indiana.eduCommunity Grid Computing Laboratory,Pervasive Technology LabsIndiana University. http://www.naradabrokering.org http://www.naradabrokering.org spallick,gcf@indiana.edu
Talk Outline • Motivation • NaradaBrokering Overview • Organization of XPath Profiles and XML Advertisements • Optimizations • Performance Measurements • Conclusions & Future Work http://www.naradabrokering.org spallick,gcf@indiana.edu
Motivation • Increasingly interactions between entities are getting to be network-centric. • As scale of the system increases backbone messaging infrastructure gravitates towards distributed systems. • Eliminate single point of failures, bottlenecks etc. • Entities interacting using XML encapsulated interactions will specify complex constraints. • Since volume will increase, constraints would get more fine grained. • This provides underpinnings to route Web Service invocations • Messaging infrastructure forms substrate on which we build lightweight and location independent services. http://www.naradabrokering.org spallick,gcf@indiana.edu
NaradaBrokering: Overview • Based on a network of cooperating broker nodes • Cluster based architecture allows system to scale • Provides a scaleable distributed event service • Publish/Subscribe model. Also JMS compliant • P2P interaction support. JXTA and Gnutella (started) • Audio/Video Apps • Federation of Grid Systems (just starting) • Engineering Issues • Support for multiple network protocols. • Tunnel through firewalls/proxies http://www.naradabrokering.org spallick,gcf@indiana.edu
NaradaBrokering: Organization http://www.naradabrokering.org spallick,gcf@indiana.edu
XPath • Query language that searches for, locates, and identifies parts of XML documents. • Uses compact, non-XML syntax • Uses path syntax to navigate hierarchical structure of XML documents. • Operates on abstract, logical structure of XML documents • Matching queries to XML documents • We say a XPath query matches an XML event if that XML event satisfies constraint specified in the query. http://www.naradabrokering.org spallick,gcf@indiana.edu
XPath Profiles and XML Advertisements • XPath Profile • Specification of an XPath constraint that XML events must satisfy prior to being routed to the client. • Interest in events conforming to a specific template. • Match real-time XML events • XML Advertisements • This could be a resource that is described in XML. • Clients interested in locating resources can use an XPath query to locate them. • Disovery • Matching times increase with • Increase in the number of profiles/advertisements being maintained • Complexity of the matching operation • XPath, SQL matching tends to be more expensive. http://www.naradabrokering.org spallick,gcf@indiana.edu
Organization of Profiles and Routing • Client profiles are stored hierarchically within the system. • A broker maintains client profiles, cluster-controller maintains broker profiles/advertisements and so on. • When an event is received, the event is matched against stored profiles and destinations are computed • A cluster-controller computes broker destinations. A broker computes client destinations. • Every broker node, when supplied with a set of destinations, computes the best broker-hops to take to reach these destinations. http://www.naradabrokering.org spallick,gcf@indiana.edu
XPath Profile Matching Optimizations • XPath Profiles have the following format <id> <constraint> <destination> • Destination is a 32 bit integer of form 000….001…00 • Matching process returns with a destination list. • Starts with an empty list • When there is a match destination is added. Simply perform bitwise OR operation. • So if both brokers 000..100… and 000..010… are interested the destination list would be 000..110… • Once a destination is added to the computed list, XPath profiles registered to this destination are not considered for subsequent matching against the same XML event. • The savings are enormous especially when there a large number of profiles. • Not all nodes involved in the calculation process • Matching costs are amortized over the entire broker network. http://www.naradabrokering.org spallick,gcf@indiana.edu
XML Advertisements and Optimizations • Organizations and such • Advertisements have a destination associated with them too. • The organizational scheme is similar to profiles. • XPath query issued by a client is matched against stored advertisements. • Controllers at different levels return results. • Optimizations • Eliminating location of the same resource from the same unit. • A cluster controller would’ve returned all resources for that cluster, no need to match advertisements (at super-cluster controller) registered to that cluster. • We could limit the default number of matching advertisements that are returned as a result of the query. http://www.naradabrokering.org spallick,gcf@indiana.edu
Restricting Scope of Matching • Ensure resources aren’t available beyond a realm • Restrict propagation of advertisements/profiles. • E.g. profile/advertisement not to be sent beyond cluster. • ACLs could be included with advertisements • Checked to ensure service not seen for queries with improper credentials. • Specifying depth of queries • Ensure localized resources. • For e.g. one would be interested in resources advertised by clients within its super cluster. http://www.naradabrokering.org spallick,gcf@indiana.edu
Experimental results • Stand alone process • Pentium-3 1 GHZ 256MB RAM, JVM JRE 1.4 • XPath profiles are evenly distributed over 32 sub-unit destinations. • Xalan parser http://www.naradabrokering.org spallick,gcf@indiana.edu
http://www.naradabrokering.org spallick,gcf@indiana.edu
http://www.naradabrokering.org spallick,gcf@indiana.edu
What the numbers mean • With optimizations profile matching times varies between 120-170 milliseconds for 10,000 profiles. • Our conjecture is that in most practical situations performance would be similarly enhanced. • For advertisements the costs would vary depending on the number of results requested. • Clearly can be used in the discovery of resources since these queries don’t have stringent real time constraints. • Computing costs are incurred at controllers. • Matching costs are thus amortized over the network. http://www.naradabrokering.org spallick,gcf@indiana.edu
Conclusions and Future work • As far as we know this is the first system to incorporate both distributed XPath profile & XML advertisement matching. • Content routed to valid destinations. • Results demonstrate that the scheme is indeed feasible. • Future Work • Equivalence of XPath queries. • Effective organization of “related” advertisements is another entry point for reduction of costs associated with discovery • Advertisements that have related schema or whose DOM have similar nodes. • Investigate use of Native XML databases such as Xindice and eXist. http://www.naradabrokering.org spallick,gcf@indiana.edu
Related work • Publish/Subscribe systems • Elvin, Sienna, Gryphon • P2P Systems • JXTA, Gnutella • JMS systems • Uses TextMessage to package XML document. http://www.naradabrokering.org spallick,gcf@indiana.edu