Operating System Support for improving data locality on CC-NUMA machines

Operating System Support for improving data locality on CC-NUMA machines CSE597A Presentation By V.N.Murali

WHY CC-NUMA? • Scalable with increase in number of nodes • Attractive properties.Transparent access to local and remote memory at the cost of increased access latency to remote memory. • 2 variations,CC-NUMA-(Stanford DASH,MIT Alewife,Sequent),CC-NOW(SUN s3.mp).

OS support • Most important issue :Data locality, • Performance enhancement provided by OS supported page migration and replication by as much as 30%

Issues in Migration/Replication • When should pages be migrated? • When should pages be replicated? • Both are needed to boost performance. • When not to migrate/replicate is also important. • Which system parameter can be used to decide? Ideas?

Differences with S/W shared memory • M & R in S/W DSM is needed for correctness.On CC-NUMA M&R is purely an optimization. • M & R in S/W DSM is triggered by page faults.On CC-NUMA M&R is triggered by cache misses.

If workload exhibits good cache locality,less benefits from M&R.Hence selective criteria for moving pages. • Study based on SimOS environment.

Solution • How do we improve data locality? • 3 access patterns a)primarily accessed by a single process b)mostly read access by many processes c)both read and write access by many processes • Which method has to be applied for a),b),c)?

Costs to be considered • 1)Cost of determining candidate pages for M&R. (Cost of cache misses/TLB misses) • 2)Overhead of M&R.(new mappings,allocating a page,flushing TLB) • 3)Actual data transfer • 4)Memory pressure!

miss rate to page HIGH LOW nothing sharing? HIGH LOW write freq. and mem.pressure migration rate HIGH HIGH LOW LOW nothing replicate nothing migrate

Key Parameters

Summary of the algorithm • “Hot page”:page whose counter for a processor reaches the trigger threshold • If the miss counter for this page (on any other processor) reaches the sharing threshold then it is considered for replication else it is considered for migration. • Replicated only if write counter has not exceeded write threshold.Migrated only if the migrate counter has not exceeded migrate threshold

Implementation details • Directory controller maintains the miss counters and generates a low-priority interrupt. • Bunches a couple of pages before raising interrupt. • Writes to replicated pages are collapsed to a single page

IRIX changes • Replication support • Finer grain locking • Page table back mappings

Workloads • Engineering workload:large sequential + memory intensive,used Verilog simulator,Flashlite. • Parallel application : Raytrace which is a parallel graphics algorithm • Scientific workload : Splash • Decision support database • Multiprogrammed software: Pmake

Performance analysis • 3 factors a)user stall time ,b)fraction of misses satisfied in local memory,c)kernel overhead. • Engineering:large user stall time=>best performance gain.M&R were used successfully • Raytrace: read only accesses mostly.Mainly benefits from replication.

Splash:3 parallel applications,Raytrace,Ocean,Volume rendering.For ocean migration is helpful.Raytrace and Volume can benefit from replication • Database:mostly read access and hence replication

Alternative policies • Static policies,dynamic policies. • Static:Round robin,First touch,Post facto(similar to optimal page replacement algorithm) • Dynamic:Migration only,replication only,Migration-Replication.

Operating System Support for improving data locality on CC-NUMA machines

Operating System Support for improving data locality on CC-NUMA machines

Presentation Transcript

Operating System Support for Virtual Machines

Operating System Support for Virtual Machines

Operating System support

Locality Optimizations in cc-NUMA Architectures Using Hardware Counters and Dyninst

Cache Coherence in NUMA Machines

Operating System Support for Fine-Grain Parallelism on Multicore Architectures

Operating System Support for Performance Monitoring

Operating System Support for improving data locality on CC-NUMA machines

Operating System Support

Data Locality

Operating System Support for Virtual Machines

PlanetLab Operating System support*

Operating System Support Services

NUMA Parallel Machines

Operating System support for Multimedia

Operating System Support for Virtual Machines

Operating System Support for Virtual Machines