1 / 19

WebSphere XD Compute Grid High Performance Architectures

WebSphere XD Compute Grid High Performance Architectures. Snehal S. Antani, antani@us.ibm.com WebSphere XD Technical Lead SOA Technology Practice IBM Software Services. Overview. Key Components of WebSphere XD Compute Grid Job Scheduler [formerly named the Long Running Scheduler (LRS)]

gwylan
Download Presentation

WebSphere XD Compute Grid High Performance Architectures

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. WebSphere XD Compute GridHigh Performance Architectures Snehal S. Antani, antani@us.ibm.comWebSphere XD Technical LeadSOA Technology PracticeIBM Software Services

  2. Overview • Key Components of WebSphere XD Compute Grid • Job Scheduler [formerly named the Long Running Scheduler (LRS)] • Parallel Job Manager (PJM) • Grid Endpoints [formerly named Long Running Execution Environment (LREE)] • High Performance Architectures and Considerations

  3. XD Compute Grid Components • Job Scheduler (JS) • The job entry point to XD Compute grid • Job life-cycle management (Submit, Stop, Cancel, etc) and monitoring • Dispatches workload to either the PJM or LREE • Hosts the Job Management Console (JMC) • Parallel Job Manager (PJM)- • Breaks large batch jobs into smaller partitions for parallel execution • Provides job life-cycle management (Submit, Stop, Cancel, Restart) for the single logical job and each of its partitions • Is *not* a required component in compute grid • Grid Endpoints (GEE) • Executes the actual business logic of the batch job

  4. XD Compute Grid Components Load Balancer • EJB • Web Service • JMS • Command Line • Job Console GEE WAS User JS WAS GEE WAS PJM WAS

  5. Key Influencers for High Performance Compute Grids • Proximity to the Data • Bring the business logic to the data: co-locate on the same platform • Bring the data to the business logic: in-memory databases, caching • Affinity Routing • Partitioned data with intelligent routing of work • Divide and Conquer • Highly parallel execution of workloads across the grid • On-Demand Scalability

  6. Proximity to the Data- Co-location of business logic with data Frame Job Scheduler WAS z/OS WAS z/OS Controller Controller GEE GEE GEE GEE GEE GEE Servant Servant Servant Servant Servant Servant DB2 on z/OS

  7. Proximity to the Data- Bring data to the business logic with caching Frame CPU CPU Job Scheduler LPAR GEE GEE GEE Data Gridnear-cache Data Gridnear-cache Data Gridnear-cache CPU CPU CPU CPU CPU CPU LPAR LPAR LPAR CPU CPU DG Server DG Server LPAR LPAR Database

  8. Affinity Routing- Partitioned data with intelligent routing of work Frame Job Scheduler Records A-M Records N-Z WAS z/OS WAS z/OS Controller Controller A-D E-I J-M N-Q R-T W-Z GEE GEE GEE GEE GEE GEE Servant Servant Servant Servant Servant Servant DB2 Data Sharing Partition DB2 Data Sharing Partition Records A-M Records N-Z

  9. Affinity Routing- Partitioned data with intelligent routing of work Frame CPU CPU Job Scheduler GEE GEE GEE Data Gridnear-cache Data Gridnear-cache Data Gridnear-cache CPU CPU CPU CPU CPU CPU Records A-I Records J-R Records S-Z CPU CPU DG Server DG Server Records A-M Records N-Z Database

  10. Divide and Conquer- Highly Parallel Grid Jobs Large Grid Job Frame CPU CPU PJM GEE GEE GEE Data Gridnear-cache Data Gridnear-cache Data Gridnear-cache CPU CPU CPU CPU CPU CPU Records A-I Records J-R Records S-Z CPU CPU DG Server DG Server Records A-M Records N-Z Database

  11. On-Demand Scalability- With WebSphere z/OS Frame Job Scheduler WAS z/OS WAS z/OS Controller Controller zWLM zWLM GEE GEE GEE GEE GEE GEE Servant Servant Servant Servant Servant Servant DB2 on z/OS

  12. On-Demand Scalability- With XD Operations Optimization Frame CPU CPU CPU CPU Job Scheduler On-DemandRouter LPAR LPAR GEE GEE GEE Data Gridnear-cache Data Gridnear-cache Data Gridnear-cache CPU CPU CPU CPU CPU CPU LPAR LPAR LPAR CPU CPU DG Server DG Server LPAR LPAR Database

  13. Backup

  14. Near-Cache Hit Probability of cache hit Time (ms) to retrieve data from cache Data Access Probability that data is in cache server OG Server Hit Time (ms) to retrieve data from cache server Probability of cache miss Near-Cache Miss Time (ms) to retrieve data from other storage Probability that data must be retrieved from database OG Server Miss(Access DB) Time (ms) to retrieve data from database Data Access time (ms) = (Probability of near- cache hit) * (Time to retrieve data from near-cache) + (Probability of near-cache miss) * (time to retrieve data from other storage);Time to retrieve data from other storage (ms) = (Probability that data is in cache server) * (Time to retrieve data from cache server) + (Probability that data must be retrieved from database) * (time to retrieve data from database);

  15. Near-Cache Hit P1 S1 Data Access OG Server Hit P3 S3 P2 Near-Cache Miss S2 P4 Data Access = (Near-Cache Hit) + (Near-Cache Miss) Near-Cache Hit = (P1)(S1) Near-Cache Miss = (P2) * [ (P3)(S3) + (P4)(S4) ] Improve data access time by: • Increase P1: • increase cache size (increase heap, etc) • establish request affinity • Decrease S1: • Dynamically add more CPU • Decrease S2 • Increase P3 • Increase size of cache server • Establish query/data affinity • Decrease S3 • Dynamically add more CPU • Decrease S4 • Dynamically add more CPU • Reduce network latency OG Server Miss(Access DB) S4

  16. Example calculation Near-Cache Hit 30% 1 ms Data Access OG Server Hit 47.2 ms 70% 10 ms 70% Near-Cache Miss 67 ms 30% OG Server Miss(Access DB) Data Access = (Near-Cache Hit) + (Near-Cache Miss) Near-Cache Hit = (P1)(S1) Near-Cache Miss = (P2) * [ (P3)(S3) + (P4)(S4) ]Near-cache miss = (.7)(10) + (.3)(200) = 7 + 60 = 67 ms Data Access = (.3)(1) + (.7)(67) = .3 + 46.9 = 47.2 ms 200 ms

  17. Example calculation- effects of increasing size of near-cache Near-Cache Hit 60% 1 ms Data Access OG Server Hit 27.4 ms 70% 10 ms 40% Near-Cache Miss 67 ms 30% OG Server Miss(Access DB) Data Access = (Near-Cache Hit) + (Near-Cache Miss) Near-Cache Hit = (P1)(S1) Near-Cache Miss = (P2) * [ (P3)(S3) + (P4)(S4) ]Near-cache miss = (.7)(10) + (.3)(200) = 7 + 60 = 67 ms Data Access = (.6)(1) + (.4)(67) = .6 + 26.8 = 27.4 ms (47.2 – 27.4) / 47.2 = 42% improvement in data access time 200 ms

  18. Example calculation- effects of adding more CPU and decreasing network latency to the DB Near-Cache Hit 30% 1 ms Data Access OG Server Hit 26.2 ms 70% 10 ms 70% Near-Cache Miss 37 ms 30% Data Access = (Near-Cache Hit) + (Near-Cache Miss) Near-Cache Hit = (P1)(S1) Near-Cache Miss = (P2) * [ (P3)(S3) + (P4)(S4) ]Near-cache miss = (.7)(10) + (.3)(100) = 7 + 30 = 37 ms Data Access = (.3)(1) + (.7)(37) = .3 + 25.9 = 26.2 ms (47.2 – 25.9) / 47.2 = 45% improvement in data access time OG Server Miss(Access DB) 100 ms

  19. WebSphere XD Compute GridInfrastructure Topology Considerations

More Related