540 likes | 654 Views
Today. Welcome Virtual memory wrap-up (Memory Hogs), Start I/O, Midterm review Reading: Chapter 5 MOS P1: Virtual Memory, Midterm Break: P2: I/O basics Questions from last time?. Page replacement. Local Global. Memory Hogs. Overview. Huge data sets => memory hogs Insufficient RAM
E N D
Today Welcome Virtual memory wrap-up (Memory Hogs), Start I/O, Midterm review Reading: Chapter 5 MOS P1: Virtual Memory, Midterm Break: P2: I/O basics Questions from last time?
Page replacement Local Global
Overview • Huge data sets => memory hogs • Insufficient RAM • “out-of-core” applications > physical memory • E.g. scientific visualization • what will happen? • Virtual memory + paging • Resource competition: processes impact each other • LRU penalizes interactive processes … why?
The Problem Why the trend?
Page Replacement Options • Local • this would help but very inefficient • allocation not according to need • Global • no regard for ownership
Be Smarter • I/O cost is high for out-of-core applications (I/O waits due to PFs) • Pre-fetch pages before needed • get a batch of pages, compute on them, in parallel get next batch • Release pages • Application may know about its memory use • Help the OS • automate in compiler • think: big loops
Working Together • OS role • Provide available memory • Compiler role • Insert pre-fetch, release hints – which: identify candidates • Not perfect – when: why? • Augment with run-time library
Compiler Analysis Example • Ideally with each inner loop iteration: • Pre-fetch a[i+1][j+1] • Release a[i-1][j-1] • Requires compiler be aware of array dimensions and physical memory available!
OS Support • OS maintains a shared page per application to communication to runtime library • Current pages in use and which are in memory; how much free memory • Upper limit on pages that can be used per application • Handling prefetches • Discard prefetch if no free pages OR ? • Prefetches not fully validated and not mapped into TLB: why?
OS Support • Releaser – new system daemon • Identify candidate pages for release – how? • not needed in future or will access other pages that will fill up allowable memory before this one • Prioritized • Leave time for rescue • Victims: Write back dirty pages • Consider process upper limit
OS Support Setting the upper limit: process limit – take locally Upper limit = min(max_rss, current_size + tot_freemem – min_freemem) - Not a guarantee, just what’s up for grabs take globally
Compiler support • Infer memory access patterns • Most useful with arrays with static sizes, nested loops • Schedule prefetches • Schedule releases • Assign priority
Runtime System • Buffers releases • Prioritized queues • Higher priority = needed again sooner • Decides how many to release and which ones • Compensate for compiler mistakes
Overall Results • Analyzed five applications • data significantly larger than physical RAM • With pre-fetching • avoided I/O waits in 85% of test cases • reduced additional OS overhead by avoiding page faults • fewer context switches
Results (cont.) • Pre-fetching + releases • Reduced total run time 30% - 60% • Interactive apps could remain in memory
Out-of-core app performance Why does release help out-of-core apps? smarter page replacement; paging daemon
Pros • Performance increase • No changes to existing application • Reduces OS overhead from page faults
Cons • Requires special OS support • Only works for arrays • Doesn’t handle pointer-based structures
Conclusion • Too much data, not enough memory • Application can help out the OS • Compiler inserts data pre-fetch and release • Adaptive run-time system • Reduces thrashing, improves performance, plays nicely with other apps on the system • Why not used in practice?
Midterm Review Closed book, 1.5 hour – covers material from day 1 through today (Memory Hogs) threads, processes, scheduling, synchronization, deadlock, and memory management, including research papers Sample exam from older class – note: this is just an idea of the length and depth of questions
Midterm Mixture of short answer (book, HW) questions (~ 40%) few sentence answers Longer analysis or “to do” questions (~ 60%) I will ask you to implement/solve some kind of synchronization problem I will ask you about segmentation
Midterm Stress material covered in class Sample questions: What is the difference between logical and physical concurrency? What is test-and-set-lock? Give an example showing what it is used for. Explain why it achieves atomic execution on a multiprocessor in which threads can run on separate CPUs (and still share variables), while interrupts do not.
Midterm Multi-threaded Synchronization. We wish to synchronize a single pair of threads (a server and a client) that each run continuously. The server thread waits and blocks for a request from the client. The client issues a request by placing the request data in a shared variable (x) and then blocks waiting for a return value (ans) stored by the server thread. (a) Show a sketch of the client and server code using semaphores (b) Show a sketch of the client and server code using condition variables
CSci 5103Operating Systems I/O – Chapter 5
Coverage Lots of nitty gritty hardware issues We will focus on the most interesting aspects of I/O disks – RAID, scheduling stable storage spooling power management
Introduction • I/O Stack • I/O Hardware Devices • block – e.g. disk, character – e.g. terminal, others – e.g. clock (ints) • I/O Software • 3 OS layers • 1 User layer I/O lib formatting ?
I/O Software Stack In Action • Layers of the I/O system and the main functions of each layer
Principles of I/O Hardware • Some typical device, network, and data rates • Key feature? • Myriads of speeds and capacities • Not surprisingly – most of the OS code deals with I/O!
Goals of I/O Software • Device independence • Uniform naming – example? • name should not depend on device (D:\foo) • Error handling –why is this a challenge? • big issue: MANY failure modes • device, controller, … OS
Goals of I/O Software • Synchronous vs. asynchronous transfers • OS should mask asynchronous nature of devices if required • Buffering • data coming off a device cannot always be stored in final destination • example? • (e.g. network packet)
Device Controllers • I/O devices have two components: • mechanical component • electronic component • Electronic component - device controller • may be able to handle multiple devices • Controller's tasks • convert serial bit stream to block of bytes (deal with I/O bit rate) • perform error correction • buffering
Device Drivers • Invoke command sequence needed to control device, e.g. write stuff into registers …block until interrupt occurs • I/O can be initiated by special I/O instructions that read/write I/O “control registers” or I/O ports (serial, parallel, USB, CD, disk, …) • OUTIO_PORT, CPU_REG– writes contents of CPU_REG into IO_PORT • Drivers are typically dynamically loaded into the OS • Drivers must be reentrant – why? • Today – hot-pluggable systems require more sophisticated drivers – why?
Device-Independent I/O Software: Standard Interface • (a) Without a standard driver interface • (b) With a standard driver interface • e.g. read/write block device: my_block_driver (dev#, I/O_type, …) • 1. simplifies OS – all drivers support same interface • 2. kernel routines that the drivers internally use can also be given standard interfaces
Spooling • Spooling is a form of asynchronous user-level I/O • Spooling directory • Spooling daemon • To print, put file in spooling directory where daemon will find and print it • Why? • Dedicated devices, e.g. printers, unwise to let users open and hold them …
Disk Arm Scheduling Algorithms • Time required to read/write a disk block determined by 3 factors • Seek time (to locate cylinder) - dominates • Rotational delay (to locate sector) • Actual transfer time Options: FCFS, SSF, Elevator At 11 – read on 1, 36, 16, 34, 9, and 12 queue of disk requests FCFS problems? Shortest Seek First (SSF) disk scheduling algorithm: 12, 9,16,1,34,36, Problems?
Elevator At 11 – read on 1, 36, 16, 34, 9, and 12 • The elevator algorithm for scheduling disk requests • Keep moving in the direction you are going; when nothing ahead of you; turn around • Perform better or worse than SSF? • usually worse
Disk Issues Reliability Performance latency bandwidth
Disk Hardware -- RAID • Raid levels 0 through 1 – blocks are striped across drives • Backup and parity drives are shaded Adv of 0? Adv of 1? reads vs. writes
Disk Hardware -- RAID Parity adv? less storage, I/O R3: smaller I/Os, ^ tput, synch issues • Raid levels 3 through 5 • Backup and parity drives are shaded R4: heavy load on parity drive
Stable Storage Disk errors -> write during a crash, spontaneous bit error errors detected by ECC Operations for stable storage using 2 identical disks (spontaneous error: 1 drive only) Stable writes Write 1 disk, then read it back, check ECC, do N times until it works, get a spare if not Write 2, ….
Stable Storage: reads Stable reads read disk 1, if ECC, try N times, else read disk 2, … since can’t have 2 disk errors, will succeed
Stable Storage: crash recovery Spontaneous bit errors (in 1 drive) are no problem CPU crashes during stable writes
Power Management • Batteries are not following Moore’s law • OS must decide what state to put devices in: on, off, hibernate, sleep • Hibernate uses less power, but more expensive to turn on than from sleep • OS can try to reduce power and/or application can as well – power/latency or power/fidelity tradeoff
Power management: Display • The use of independent zones for backlighting the display • Don’t illuminate windows no longer in active focus
Power Management: Disks • Disk spinning wastes power • Spin-down, but re-spinning has a high latency • Provide more OS info to the application • e.g. OS tells application that it has put disk in hibernation mode, application can delay writes until disk is on • What other simple technique can improve the situation? • Disk caching can help
Power Management: CPU • Voltage scaling • modern CPU have several clock speed modes • run slower, consume less power • multicore: larger # of slower processors • IBM Blue Gene • Cutting voltage by two • cuts clock speed by two (linear) • cuts power by four (square)