470 likes | 717 Views
Messaging Systems for the Grid. Daniel Rodrigues. Summary. Messaging Systems Overview Monitoring context in the Grid The MSG – Messaging System for Grids Fast Forward. Messaging Systems.
E N D
Messaging Systems for the Grid Daniel Rodrigues
Summary • Messaging Systems Overview • Monitoring context in the Grid • The MSG – Messaging System for Grids • Fast Forward
Messaging Systems • Before going any further, the philosophy: “Software development trend is to somehow mimic real world!” – Daniel Rodrigues • Procedural Programming Beaureaucracy • Object Oriented World entities and interaction • Aspects Cut through the mess! • Agents Real People. • Messaging Systems Communication • It might be sound, image, snailmail, etc.
Messaging Systems • Why use messaging? • For communicating we could use: • File transfer • Shared Databases • Remote Procedure Invocation • Web Services • Mail • CORBA • They do exist; • They have common ideas; • They share implementations; • You might be using more than one to achieve a result that suits your needs! “Now look, you know different people think about life in different ways. Lawyers think life is a big court room; Doctors probably thinks life is like a big operation; Bus drivers think life is...er...a big bus I guess. Who knows what the hell those guys think. Anyway, I've always thought of life as a big football game...” Black Grape, England’s Irie
Messaging Systems • Why use messaging? • Key ideas and benefits: • Loosely coupled distributed communication; • Exceptional interoperability; • Asynchronous; • Reliable; • Configurable Persistence (just like your tax collector) • Drawbacks: • More complex programming model (we do like bureaucracy after all ) • Harder to do sequenced and synchronous model • Performance? (maybe FTP could do the trick)
Messaging Systems • Ok, may we finally see a picture? Publisher Publisher Publisher Publisher Consumer
Messaging Systems • That’s all? • Enterprise Integration Patterns • Designing, Building and deploying Messaging Solutions • Gregor Hohpe / Bobby Woolf • Core Patterns • Some not so wild Patterns
Messaging Systems • Patterns: Message • Header • Routing information • Description • Body • Data • Ignored by the messaging system • EventMessage, CommandMessage, DocumentMessage, RequestReply • Could be SOAP, JMS, Stomp, etc.
Messaging Systems • Patterns: Message Channel • Point-to-Point • Snail Mail • Queues • Publish-Subscribe • Television/radio Broadcast • Topics • DataTypeChannel, InvalidMessageChannel, DeadLetterChannel, ChannelAdapter, MessageBus, MessagingBridge
Messaging Systems • Patterns: Message Endpoint • Publisher • Gets data from application and creates a message. • Consumer • Extracts data from a message and passes it on to the application. • SelectiveConsumer, CompetingConsumer, DurableSubscriber, MessageDispatcher, TransactionalClients, EventDrivenConsumer. • Endpoints either sends or receives messages, and are channel specific. (Ears mouth,eyes are not the same thing)
Messaging Systems • Other Patterns • Message Routers • Message may be routed to different channels depending on its characteristics; • Simple Example: use a wild card topic! • grid.usage.transfer.*, where it will be forwarded to grid.usage.transfer.<INFRASTRUCTURE> • MessageTranslators • Translation at different layers (data structure, types, representation, or transport). • e.g. transport protocols: TCP => HTTP => SOAP => JMS • Pipes and Filters • Message may need processing in different steps. • A Message goes through filtering and pipes that perform different functions (e.g, authN, authZ)
Messaging Systems • Isn’t it too complex to implement all this? • Indeed. • But someone has already done most of the work for you: • Commercial solutions: • Tibco Rendezvous, IBM WebSphere MQ, SUN Java Message Service, Microsoft MSMQ, BEA MessageQ, SonicMQ, 29West UME/LBM. • OpenSource providers: • Apache ActiveMQ, ObjectWeb JORAM, Open JMS. • Each are adequate to different problems. • Integration on different platforms; • Latency concerns; • High throughputs;
Messaging Systems • Where is it used? • Financial Services • exchanges, brokerages, hedge funds; • Insurance Companies • Banking Industry • Telecoms • Usually embedded in integrated solutions • Enterprise Backbones; • WebsphereMQ example (March 2007): • 10.000 customers • 10 billion messages carrying US$1 quadrillion (US$ 1 000 000 000 000 000) worth of business transactions.
Summary • Messaging Systems Overview • Monitoring context in the Grid • The MSG – Messaging System for Grids • Fast forward
Monitoring Context • How does Message Oriented Middleware fit into the WLCG monitoring context? • Grid is a complex infrastructure, with many different services deployed in different environments. • We need to monitor the services in order to: • Know when an action to repair is necessary; • Help improve the overall reliability; • Provide stakeholders with current and historical status information. • A vast amount of monitoring data is produced • Local fabric monitoring( e.g., Nagios, LEMON) • Remote monitoring (e.g., SAM)
Monitoring Context • Who is involved (stakeholders)? • Site Administrators • Grid Operators • CIC on Duty • Regional Operation center • WLCG Project management • Virtual Organizations • WLCG Experiments • Monitoring developers + operators
Monitoring Context • High Level Model: LEMON GridView Experiment Dashboard R-GMA Nagios GOCDB GridView HTTP GridIce Dashboard LDAP SAM GridIce SAME GridView
Monitoring Context • WLCG Monitoring Working Group: • Initially focused on stakeholder requirements • Distill into a set of architectural principles • Propose some new technologies to help • Reuse of standard commodity components • Used to design site-local monitoring prototype • An attempt to extend this to a more global view • Knowing that operations model is changing from central to regional/national/local • Looking on the architectural principles…
Monitoring Context • Reduce time to respond • “Site administrators are closest to the problems, and need to know about them first” • Tell others what you want to know • “If you’re monitoring a site remotely, it’s only polite to give the data to the site” Chris Brew • Remote systems should feed back information to sites • Don’t impose systems on sites • Cannot dictate a monitoring system
Monitoring Context • No monolithic systems • Different systems should specialize in their areas of expertise • No central bottlenecks • “Local problems detected locally shouldn’t require remote services to work out what the problem is” • Specific Visualization for each stakeholder • All are using same underlying data
Monitoring Context • The starting point is what we have now: • Availability testing framework – SAM/RSV • Job and Data reliability monitoring –Gridview • Grid topology – GOCDB/Registration DB • Dynamic view of the grid – BDII/CeMon • Accounting – APEL/Gratia • Experiment views – Dashboards • Fabric monitoring – Nagios, LEMON, … • Grid operations tools – CIC Portal • They work together right now • To a certain extent !
Monitoring Context • We need: • Loose coupling of systems • Distributed components • Reliable delivery of messages • Standard methods of communication • Flexibility to add new producers and consumers of the information without having to reconfigure everything • Message Oriented Middleware provides this • And is widely used in similar scenarios
Monitoring context • Reliablity and persistence of messaging built into the broker network. • Mitigates the single point of failures we’ve had with previous solutions
Monitoring context • Not a silver bullet • Still can end up with spaghetti • Tight specification of interaction of components • Message format specifications • Standard metadata schema • Message Queue naming schemas • Protocols • System management is key • You’ve got code for free from the messaging system • But you need to write your management layer • Component co-ordination • Configuration • Message tracing • Debugging
Monitoring context • Conclusion • The monitoring context is highly distributed; • Many components could benefit from gathering common information in a reliable, flexible way; • MOM is a way of leveraging the current underlying infrastructures;
Monitoring Context • A real life working example:
Monitoring Context Transparent Broker Network Messaging System Adaptor Database archiver component Standard components Standard process Application WLCG Monitoring – some worked examples - 28
Summary • Messaging Systems Overview • Monitoring context in the Grid • The MSG – Messaging System for Grids • Fast forward
MSG Overview • An infrastructure providing an easy way to send messages; • Each message has a well defined format adhering to a message class specification • Well defined set of message classes • Three main components: • Apache ActiveMQ broker; • msg-publish-simple; • msg-consume2oracle; • Using file-based SAN persistency; • Publish-Subscribe Channels (Topics) • Durable Subscribers
MSG: Message • Message endpoints on a topic should: • Consumers: expect a well formatted message • Producers: send a properly formatted message • Message Classes: • To each corresponds a specification • One message may contain multiple records • Each record consists of plain text key-value pairs, terminated by “EOT” • A few fields are mandatory: Consumers are expecting them! • Some fields may be sent as an header (for later filtering using selectors)
MSG: Message • Example: transferProtocol: GridFTP publishingHost: lxfsrc5807.cern.ch voName: cms srcHost: lxfsrc5807.cern.ch destHost: c2fs008.grid.sinica.edu.tw gridftpStreams: 10 numberBytes: 2684354560 fileName: //castor/cern.ch/cms/store/PhEDEx_LoadTest07_4/LoadTest07_CERN_3e6 startTime: 20-05-2008T13:17:07.514952Z endTime: 20-05-2008T13:33:58.156241Z userName: cms001 EOT transferProtocol: GridFTP publishingHost: lxfsrc5807.cern.ch voName: cms srcHost: lxfsrc5807.cern.ch destHost: diskserv-san-20.cr.cnaf.infn.it gridftpStreams: 3 numberBytes: 2684354560 fileName: //castor/cern.ch/cms/store/PhEDEx_LoadTest07_4/LoadTest07_CERN_F1 startTime: 20-05-2008T13:17:46.811483Z endTime: 20-05-2008T13:34:21.227585Z userName: cms001 EOT destination: /topic/grid.usage.transfer persistent:true transferProtocol: GridFTP msgEncodedTime: 2008-05-21T22:29:57,712Z
MSG: Apache ActiveMQ • Powerful OpenSourceMessageBroker • Currently running v4.1 & v5.1 • Message Channels • Publish-Subscribe; • Point to Point; • VirtualDestinations, Wildcards, CompositeDestinations; • Synchronous / Asynchronous sending. • Wide range of supported protocols and clients • Open Wire for high performance clients; • STOMP (Simple Text Oriented Protocol); • REST, XMMP, AMQP;
MSG: Apache ActiveMQ • Configurable persistence • JDBC + High performance journal • File based MessageStore (Since 5.0) • Clustering • Master/Slave failover • Provides High Availability • Network of Brokers • Avoid Client/server || hub/spoke single point of failure • Store and forward with consumer priority • Increasing Scalability • Consumers and Producers load balancing • Selectors • Discovery
MSG: msg-publish-simple • Send messages into the Message Channel • Validates well formatted against message class; • Reassembles records according to selected headers; • Very lightweight script • Depends only on Python > 2.3 • Uses python asyncore • Designed to run anywhere (e.g. WN’s) • Can use many broker endpoints (will select one which is available) • Use either STOMP or plain HTTP
MSG: msg-consume2oracle • Consumes messages • Creates a durable subscription; • Can read different message classes on different topics (one durable subscription per topic!) • Publishes into Oracle. • Extracts records from incoming messages; • Inserts records into an Oracle View, corresponding to the message class definition. • Only need to worry about the trigger! • Configurable system management • Publishes back client status information • Messages received in a topic; • Records inserted of a given message class; • Very lightweight script • Depends only on Python > 2.3 • also cx_oracle
MSG: performance • Extensive testing of broker many features under different configurations • Test results available on twiki, here are some: • Broker ran for 6 weeks with no crashes • 50 million messages of several sizes (0 to 10 kB) forwarded to consumers; • 12 million incoming messages from producers; • Up to 40 producers/80 consumers; • Stable under irregular testing pattern; • Setting persistence limits throughput.
MSG: performance • Throughput testing
MSG: performance • Testing persistency
MSG: performance • Testing persistency
MSG: performance • Testing clustering • Fast internal openwire!
MSG: results • Flagship: OSG RSV – SAM bridge • Running since January. • Crashed once, because there were not enough file descriptors configured. • Gridview - GridFTP transfers. • Currently publishing from 27 cms t1transfer machines; • In testbed right now, a validation consumer;
Summary • Messaging Systems Overview • Monitoring context in the Grid • The MSG – Messaging System for Grids • Fast Forward • In the monitoring context.
MSG: results • Migrating to Regions
MSG: results • Messaging based archiving & reporting
Thank you for your attention. Additional Questions?
Thank you for your attention. Additional Questions?