1.29k likes | 1.51k Views
COSC 1306 COMPUTER LITERACY FOR SCIENCE MAJORS. Jehan-François Pâris j fparis@uh.edu. COSC 1306—COMPUTER SCIENCE AND PROGRAMMING COMPUTER ORGANIZATION. Module Overview. We will focus on the main challenges of computer architecture Managing the I/O hierarchy
E N D
COSC 1306COMPUTER LITERACY FORSCIENCE MAJORS Jehan-François Pârisjfparis@uh.edu COSC 1306—COMPUTER SCIENCE AND PROGRAMMING COMPUTER ORGANIZATION
Module Overview We will focus on the main challenges of computer architecture Managing the I/O hierarchy Caching, multiprogramming, virtual memory Speeding up the CPU Pipelined and multicore architectures Protecting user computations and data Memory protection, privileged instructions
The memory hierarchy (I) CPU registers Main memory(RAM) Secondary storage(Disks) Mass storage(Often offline)
CPU registers • Inside the processor itself • Some can be accessed by our programs • Others no • Can be read/written to in one processor cycle • If processor speed is 2 GHz • 2,000,000,000 cycles per second • 2 cycles per nanosecond
Main memory (I) • Byte accessible • Each group of 8 bits has an address • Dynamic random access memory (DRAM) • Slower but much cheaper than static RAM • Contents must be refreshed every 64 ms • Otherwise its contents are lost: • DRAM is volatile
Main memory (II) Memory is organized as a sequence of 8-bit bytes Each byte an address Bytes can contain one character Roman alphabet with accents 0 12 4 8 9 5 1 13 10 6 14 2 7 11 3 15
Main memory (III) Groups of four bytes starting at addresses that are multiple of 4 form words Better suited to hold numbers Also have half-words, double words, quad words 0 4 8 12
Accessing main memory contents (I) • When look for some item, our search criteria can include the location of the item • The book on the table • The student behind you, … • More often our main search criterion is some attributeof the item • The color of a folder • The title or the authors of a book • The name of an individual
512 513 514 515 Accessing main memory contents (II) • Computers always access their memory by location • The byte at address 4095 • The word at location 512 • States the address of the first byte in the word • Why? • Fastest way for them to access an item
An analogy (I) • Some research libraries have a closed-stack policy • Only library employees can access the stacks • Patrons wanting to get an item fill a form containing a call number specifying the location of the item • Could be Library of Congress classification if the stacks are organized that way.
An analogy (II) • The procedure followed by the employee fetching the book is fairly simple • Go at location specified by the book call number • Check it the book is there • Bring it to the patron
An analogy (III) • The memory operates in an even simpler manner • Always fetch the contents of the addressed bytes • Junk or not
Disk drives (I) • Sole part of computer architecture with moving parts: • Data stored on circular tracks of a disk • Spinning speed between 5,400 and 15,000 rotations per minute • Accessed through a read/write head
Servo Platter Arm R/W head Disk drives (II)
Disk drives (III) • Data can be accessed by blocks of 4KB, 8 KB, … • Depends on disk partition parameters • User selectable • To access a disk block • Read/write head must be over the right track • Seek time • Data to be accessed must pass under the head • Rotational latency
Estimating the rotational latency • On the average half a disk rotation • If disk spins at 15,000 rpm • 250 rotations per second • Half a rotation corresponds to 2ms • Most desktops have disks that spin at 7,200 rpm • Most notebooks have disks that spin at 5,400 or 7,200 rpm
Accessing disk contents • Each block on a disk has a unique address • Normally a single number • Logical block addressing (LBA) • Older PCs used a different scheme
The memory hierarchy (III) • To make sense of these numbers, let us consider an analogy
The two gaps (I) • Gap between CPU and main memory speeds: • Will add intermediary levels • L1, L2, and L3 caches • Will store contents of most recently accessed memory addresses • Most likely to be needed in the future • Purely hardware solution • Software does not see it
Major issues • Huge gaps between • CPU speeds and SDRAM access times • SDRAM access times and disk access times • Both problems have very different solutions • Gap between CPU speeds and SDRAM access times handled by hardware • Gap between SDRAM access times and disk access times handled by combination of software and hardware
Why? • Having hardware handle an issue • Complicates hardware design • Offers a very fast solution • Standard approach for very frequent actions • Letting software handle an issue • Cheaper • Has a much higher overhead • Standard approach for less frequent actions
Will the problem go away? • It will become worse • RAM access times are not improving as fast as CPU power • Disk access times are limited by rotational speed of disk drive
What are the solutions? • To bridge the CPU/DRAM gap: • Interposing between the CPU and the DRAM smaller, faster memories that cache the data that the CPU currently needs • Cache memories • Managed by the hardware and invisible to the software (OS included)
What are the solutions? • To bridge the DRAM/disk drive gap: • Storing in main memory the data blocks that are currently accessed (I/O buffer) • Managing memory space and disk space as a single resource (Virtual memory) • I/O buffer and virtual memory are managed by the OS and invisible to the user processes
Why do these solutions work? • Locality principle: • Spatial locality:at any time a process only accesses asmall portion of its address space • Temporal locality:this subset does not change too frequently
The true memory hierarchy CPU registers L1, L2 and L3 caches Main memory(RAM) Secondary storage(Disks) Mass storage(Often offline)
The technology • Caches use faster static RAM (SRAM) • (D flipflops) • Can have • Separate caches for instructions and data • Great for pipelining • A unified cache
Basic principles • Assume we want to store in a faster memory 2n words that are currently accessed by the CPU • Can be instructions or data or even both • When the CPU will need to fetch an instruction or load a word into a register • It will look first into the cache • Can have a hit or a miss
Cache hits • Occur when the requested word is found in the cache • Cache avoided a memory access • CPU can proceed
Cache misses • Occur when the requested word is not found in the cache • Will need to access the main memory • Will bring the new word into the cache • Must make space for it by expelling one of the cache entries • Need to decide which one
Cache design challenges • Cache contains a small subset of memory addresses • Must find a very fast access mechanism • No linear search, no binary search • Would like to have an associative memory • Can search by content all memory entries in parallel • Like human brains do
An associative memory Search for“ice cream” COSC 1306 program Finding a parking spot Found My last ice cream Other ice cream moment
An analogy (I) • Let go back to our closed-stack library example • Librarians have noted that some books get asked again and again • Want to put them closer to the circulation desk • Would result in much faster service • The problem is how to locate these books • They will not be at the right location!
An analogy (II) • Librarians come with a great solution • They put behind the circulation desk shelves with 100 book slots numbered from 00 to 99 • Each slot is a home for the most recently requested book that has a call number whose last two digits match the slot number • 3141593 can only go in slot 93 • 1234567 can only go in slot 67
An analogy (III) Let me see if it's in bin 93 The call number of the book I need is 3141593
An analogy (IV) • To let the librarian do her job each slot much contain either • Nothing or • A book and its reference number • There are many books whose reference number ends in 93or 67 or any two given digits
An analogy (V) Sure Could I get this time the book whose call number 4444493?
An analogy (VI) • This time the librarian will • Go bin 93 • Find it contains a book with a different call number • She will • Bring back that book to the stacks • Fetch the new book
A very basic cache • Has 2n entries • Each entry contains • A word (4 bytes) • Its memory address • Sole way to identify the word • A bit indicating whether the cache entry contains something useful
Valid 110 000 100 010 RAM Address RAM Address RAM Address RAM Address RAM Address RAM Address RAM Address RAM Address RAM Address RAM Address RAM Address RAM Address RAM Address RAM Address RAM Address Tag RAM Address Word Word Word Word Word Word Word Word Word Word Word Word Word Word Contents Word Word Y/N Y/N Y/N Y/N 101 111 001 011 Y/N Y/N Y/N Y/N A very basic cache (I) Actual caches are much bigger
Valid Valid Valid 000 010 000 100 000 000 110 010 000 010 110 000 010 100 010 110 110 010 100 000 010 000 100 010 110 100 100 100 Address* Address* Address* Address* Word Word Word Word Word Word Word Word Word Word Y/N Y/N Y/N Y/N Y/N Y/N Y/N Y/N Y/N Y/N Y/N Y/N 001 111 101 011 001 001 011 011 101 111 011 101 101 011 011 111 001 011 001 101 011 101 001 101 111 001 001 111 Address* Address* Address* Address* Word Word Word Word Word Word Word Word Word Word Y/N Y/N Y/N Y/N Y/N Y/N Y/N Y/N Y/N Y/N Y/N Y/N Multiword cache Tag Contents
Set-associative caches (I) • Can be seen as 2, 4, 8 caches attached together • Reduces collisions
Back to our library example • What if two books whose call number have the same last two digits are often asked on the same day: • Say, 3141593 and 4444493 • Best solution is • Keep the number of book slots equal to 100 • Store more than one book with same last two digits in the same slot