1 / 16

High Performance Distributed Computing

High Performance Distributed Computing. Sophie Lemaitre Monterey - California July 2007. Database Streams. First Keynote. One of the most interesting talks Database streams http://www.cs.berkeley.edu/~franklin/Talks/HPDC07.ppt. Data Stream Processing Approach.

tondreau
Download Presentation

High Performance Distributed Computing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. High PerformanceDistributed Computing Sophie Lemaitre Monterey - California July 2007

  2. Database Streams

  3. First Keynote • One of the most interesting talks • Database streams • http://www.cs.berkeley.edu/~franklin/Talks/HPDC07.ppt

  4. Data Stream Processing Approach Continuous, Visibility, Alerts Results Data Stream Processor Live Data Streams Always-on data analysis & alerts RT Monitor & Replay to optimize Consistent sub-second response Upside Down Approach Traditional Database Approach Static Batch Reports Queries Results Data Bulk Load Data Warehouse Batch ETL & load, query later Poor RT monitoring, no replay DB size affects query response

  5. The “Jellybean” Argument Reality: With stream query processing, real-time is cheaper than batch. minimize copies & query start-up overhead takes load off expensive back-end systems rapid application dev & maintenance Conventional Wisdom: “can I afford real-time?” Do the benefits justify the cost?

  6. Table Stream Window clause Example 2 - Stream/Table Join Every 3 seconds, compute avg transaction value of high-volume trades on S&P 500 stocks, over a 5 second “sliding window” SELECT T.symbol, AVG(T.price*T.volume) FROM Trades T [RANGE ‘5 sec’ SLIDE ‘3 sec’], SANDP500 S WHERE T.symbol = S.symbol AND T.volume > 5000 GROUP BY T.symbol Note: Output is also a Stream

  7. Stream Processing + Grid? • On-the-fly stream processing required for high-volume data/event generators. • Real-time event detection for coordination of distributed observations. • Wide-area sensing in environmental macroscopes.

  8. Industry session

  9. Industry session • Most interesting session • eBay • Same talk than at CERN • Huge number of transactions to deal with • Have to be 100% available • Had to do their own database interaction layer at some point to answer their needs • Not interested in Grids, because they want to control the whole infrastructure • Google • Disk crash not correlated with temperature • High number of disk crash when disks “burnt out” at the beginning of their life • Tony Cass - post C5: • “yes, but cooling is important for plugs and fuses”

  10. Scheduling

  11. Scheduling • Possibility for users to give priority to their job is nowadays very limited • “low”, “medium” or “high” • Utility functions • Economics applied to scheduling • Ex: if you go for lunch between 12:00 and 13:00 • Same satisfaction if job finishes at 12:01 or 12:55… • In the next talk • Hypothesis = “jobs are submitted completely randomly”

  12. GridNFS & Direct-pNFS

  13. GridNFS & Direct-pNFS • GridNFS • “Integrates NFSv4 into the ecology of Grid middleware” • Globus GSI support • name space construction and management • fine-grained access control with foreign user support • high performance secure file system access • Andy Adamson was wondering how to integrate VOMS • DPM and dCache are using virtual ids • He is considering doing the same… • Contact: Andy Adamson (andros@umich.edu) • Direct-pNFS • Outperforms pNFS, PVFS • Especially, very good performance for small I/O • Contact: Dean Hildebrand (dhildebz@eecs.umich.edu)

  14. DPM with NFSv4.1 • NFSv4.1 and DPM have similar architectures • Separate metadata server • Direct access to physical files • Easy NFSv4.1 integration

  15. Environmental concerns

  16. Climate change ? • Concerns about climate change • In several talks • A “solar panel computer” • A new plug to save energy lost in heat (Google)

More Related