1 / 17

Operating System Support for improving data locality on CC-NUMA machines

Operating System Support for improving data locality on CC-NUMA machines. CSE597A Presentation By V.N.Murali. WHY CC-NUMA?. Scalable with increase in number of nodes Attractive properties.Transparent access to local and remote memory at the cost of increased access latency to remote memory.

etenia
Download Presentation

Operating System Support for improving data locality on CC-NUMA machines

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Operating System Support for improving data locality on CC-NUMA machines CSE597A Presentation By V.N.Murali

  2. WHY CC-NUMA? • Scalable with increase in number of nodes • Attractive properties.Transparent access to local and remote memory at the cost of increased access latency to remote memory. • 2 variations,CC-NUMA-(Stanford DASH,MIT Alewife,Sequent),CC-NOW(SUN s3.mp).

  3. OS support • Most important issue :Data locality, • Performance enhancement provided by OS supported page migration and replication by as much as 30%

  4. Issues in Migration/Replication • When should pages be migrated? • When should pages be replicated? • Both are needed to boost performance. • When not to migrate/replicate is also important. • Which system parameter can be used to decide? Ideas?

  5. Differences with S/W shared memory • M & R in S/W DSM is needed for correctness.On CC-NUMA M&R is purely an optimization. • M & R in S/W DSM is triggered by page faults.On CC-NUMA M&R is triggered by cache misses.

  6. If workload exhibits good cache locality,less benefits from M&R.Hence selective criteria for moving pages. • Study based on SimOS environment.

  7. Solution • How do we improve data locality? • 3 access patterns a)primarily accessed by a single process b)mostly read access by many processes c)both read and write access by many processes • Which method has to be applied for a),b),c)?

  8. Costs to be considered • 1)Cost of determining candidate pages for M&R. (Cost of cache misses/TLB misses) • 2)Overhead of M&R.(new mappings,allocating a page,flushing TLB) • 3)Actual data transfer • 4)Memory pressure!

  9. miss rate to page HIGH LOW nothing sharing? HIGH LOW write freq. and mem.pressure migration rate HIGH HIGH LOW LOW nothing replicate nothing migrate

  10. Key Parameters

  11. Summary of the algorithm • “Hot page”:page whose counter for a processor reaches the trigger threshold • If the miss counter for this page (on any other processor) reaches the sharing threshold then it is considered for replication else it is considered for migration. • Replicated only if write counter has not exceeded write threshold.Migrated only if the migrate counter has not exceeded migrate threshold

  12. Implementation details • Directory controller maintains the miss counters and generates a low-priority interrupt. • Bunches a couple of pages before raising interrupt. • Writes to replicated pages are collapsed to a single page

  13. IRIX changes • Replication support • Finer grain locking • Page table back mappings

  14. Workloads • Engineering workload:large sequential + memory intensive,used Verilog simulator,Flashlite. • Parallel application : Raytrace which is a parallel graphics algorithm • Scientific workload : Splash • Decision support database • Multiprogrammed software: Pmake

  15. Performance analysis • 3 factors a)user stall time ,b)fraction of misses satisfied in local memory,c)kernel overhead. • Engineering:large user stall time=>best performance gain.M&R were used successfully • Raytrace: read only accesses mostly.Mainly benefits from replication.

  16. Splash:3 parallel applications,Raytrace,Ocean,Volume rendering.For ocean migration is helpful.Raytrace and Volume can benefit from replication • Database:mostly read access and hence replication

  17. Alternative policies • Static policies,dynamic policies. • Static:Round robin,First touch,Post facto(similar to optimal page replacement algorithm) • Dynamic:Migration only,replication only,Migration-Replication.

More Related