230 likes | 355 Views
Prelim 2 Topics. Older Stuff: Transistors, gates, combinatorial circuits Flip flops, latches, state machines CPU design, pipelining & hazards MIPS assembly, calling conventions Newer stuff: Physical and virtual memory, page tables, TLBs Caches, cache-conscious programming, caching issues
E N D
Prelim 2 Topics • Older Stuff: • Transistors, gates, combinatorial circuits • Flip flops, latches, state machines • CPU design, pipelining & hazards • MIPS assembly, calling conventions • Newer stuff: • Physical and virtual memory, page tables, TLBs • Caches, cache-conscious programming, caching issues • Privilege levels, syscalls, traps, interrupts, exceptions • Busses, programmed I/O, memory-mapped I/O • DMA, disks, RAID • Synchronization: deadlocks, race conditions, fairness, correctness • Synchronization primitives (locks, spinlocks, hardware support) • Synchronization abstractions (mutexes, semaphores, monitors & condition variables)
Critical Sections • Properties: correctnessat most one thread inside critical section livenessif no thread inside, then a waiting thread should eventually get in fairnessa waiting thread can get cheated only a bounded number of times (or, equal probabilites among waiters) (or, fifo ordering among waiters) (or, …) • Suppose we have two threads (with thread_id 0 and 1) using a critical section. Which of the three properties are achieved by each of the folowing implementations?
CritSec Attempt #1 int turn = 0; void enter_cs() { while (turn == thread_id); } void leave_cs() { turn = thread_id; }
CritSec Attempt #2 int in_crit_sect[2] = {0, 0}; enter_cs() { while (in_crit_sect[other_thread_id]) ; in_crit_sect[thread_id] = 1; } leave_cs() { in_crit_sect[thread_id] = 0; }
CritSec Attempt #3 int want_crit_sect[2] = {0, 0}; enter_cs() { want_crit_sect[thread_id] = 1; while (want_crit_sect[other_thread_id]) ; } leave_cs() { want_crit_sect[thread_id] = 0; }
CritSec Attempt #4 int want_crit_sect[2] = {0, 0}; int turn = 0; enter_cs() { want_crit_sect[thread_id] = 1; turn = thread_id; while (want_crit_sect[other_thread_id] && turn == thread_id) ; } leave_cs() { want_crit_sect[thread_id] = 0; }
Semaphores The following code is meant to exchange the items at the top of two stacks. - it must appear to be a single atomic operation - if either stack is empty, it should do nothing - if the stacks are one and the same, it should do nothing - multiple swaps should proceed simultaneously so long as they are using different stacks How is this code broken? Fix it using semaphores.
void atomic_swap(Stack *q1, Stack *q2) { Item *item1; Item *item2; P(q1->lock); item1 = pop(q1); if(item1 != NULL) { P(q2->lock); item2 = pop(q2); if(item2 != NULL) { push(q2, item1); push(q1, item2); V(q2->lock); V(q1->lock); } } }
Synchronization There are two printers. Write a function called print_anywhere(char *doc) that prints a document on either one of the two printers. Your solution must guarantee: safety – each printer has at most one process trying to print to it at a time liveness – a process trying to print will not wait if a printer is free fairness – a process trying to print will eventually be allowed to print
Monitors Does this work? Can we fix it? struct cond { struct sema *s; }; void wait(struct cond *c) { c->s->P(); } void signal(struct cond *c) { c->s->V(); }
Synchronization Primitives Suppose we have a kernel that does not provide any direct support for synchronization primitives. What are the consequences of this limitation?
Disks & DMA A disk controller with enough memory can perform read-ahead, reading blocks on the current track into its memory before the CPU can ask for them. Should it also do write behind?
Disks Give a nearly worst-case algorithm for erasing all data on a hard disk with 3 platters, 1000 cylinders, and 20 sectors per track. Give a best-case algorithm.
Memory & MMU #1 If a machine has 4GB of actual RAM, does it still make sense to implement virtual memory? What impact, if any, does it have on the implementation (simplicity, performance, features, etc.)? * assume 32-bit physical addresses
Memory & MMU #2 If a machine has no hard disk, does it still make sense to implement virtual memory? What impact, if any, does it have on the implementation (simplicity, performance, features, etc.)? * a so-called "thin client" that fetches all programs and data from the network
Memory & MMU #3 If a machine has no support for asynchronous traps, does it still make sense to implement virtual memory? What impact, if any, does it have on the implementation (simplicity, performance, features, etc.)? * Motorola M68000 was just such a processor but the M68010 added asynchronous trap handling.
Memory & MMU #4 If a machine's MMU has no support for page reference bits or page modification bits, how can we simulate the functionality?
Memory Layout It is nice to have applications be loadable at arbitrary places in memory (i.e., libraries, or your corewars programs). How can we achieve this?
Privilege Levels Normally the system call interface is accessed by means of executing a special instruction (i.e. syscall). We could instead make the system automatically transition to privileged mode on any jump to a kernel-only memory page. Is this feasible?Is this a good idea?
MMU-conscious programming Assume the virtual memory system uses twenty 1KB pages, with LRU replacement int A[4096][4096], B[4096][4096], C[4096][4096]; for (int i=0; i< 4096; i++) for (int j=0; j< 4096; j++) { int sum = 0; for (int k = 0; k < 4096; k++) sum = sum + A[i, k]*B[k,j]; C[i,j] = sum; } Approx. how many page faults will there be during a single iteration of the j loop? Can we improve this?
Interrupts & Polling Devices can be managed using interrupts or polling. Under what circumstances would it be better to use interrupts? polling?
Caches (a) For a 1MB, 8-way associative cache, with 128 byte blocks, find the number of bits needed for the cache tag, index, and offset. (b) Consider a direct mapped cache, with 32 lines, and a 21 bit tag. What is the block or line size (in bytes), and the capacity of the cache? (c) For a 256 byte cache with a 28 bit tag and 4 word blocks, find the number bits in the index and then compute the associativity. (d) In a 32-bit address space, 8 MB of data map to a single set/index in a cache. If the cache can hold a total of 8192 data words (not bytes) and has 64 byte blocks, what is the capacity, the number of sets, and the associativity?
Cache Misses (according to Wikipedia) • Compulsory misses are those misses caused by the first reference to a datum. Cache size and associativity make no difference to the number of compulsory misses. … • Capacity misses are those misses that occur regardless of associativity or block size, solely due to the finite size of the cache…. • Conflict misses are those misses that could have been avoided, had the cache not evicted an entry earlier.