410 likes | 528 Views
CS 270: Computer Organization Course Overview. Instructor: Professor Stephen P. Carl. Quote of the Day. 640 Kbytes [of main memory] ought to be enough for anybody. Bill Gates, 1981. Course Perspective. Most Systems Courses are Builder-Centric; this Course is Programmer-Centric
E N D
CS 270: Computer OrganizationCourse Overview Instructor: Professor Stephen P. Carl
Quote of the Day 640 Kbytes [of main memory] ought to be enough for anybody. • Bill Gates, 1981
Course Perspective Most Systems Courses are Builder-Centric; this Course is Programmer-Centric How one can become a more effective programming by knowing more about the underlying system Enable you to: Write programs that are more reliable and efficient Incorporate features that require hooks into OS E.g., concurrency, signal handlers (in CS 428) Not just a course for dedicated hackers (But - might just bring out the hidden hacker in you) You won’t see most of this material in any other course
Course Components Lectures Higher level concepts Labs and Assignments The heart of the course 1 - 3 weeks Provide in-depth understanding of an aspect of systems Programming and problem solving Exams (2 + final) Test your understanding of concepts & mathematical principles
Timeliness Grace days 4 for the course Covers scheduling crunch, out-of-town trips, illnesses, minor setbacks Save them until late in the term! Lateness penalties Once grace days used up, get penalized 10% per day Typically shut off all handins 4 days after due date Catastrophic events Major illness, death in family, … Work with your professor on plan for getting back on track Advice Once you start running late, it’s really hard to catch up
Cheating What is cheating? Sharing code: either by copying, retyping, looking at, or supplying a copy of a file. Coaching: helping your friend to write a lab, line by line. Copying code from previous course or from elsewhere on WWW Only allowed to use code we supply, or from CS:APP website What is NOT cheating? Explaining how to use systems or tools. Helping others with high-level design issues. Penalty for cheating: Case remanded to Honor Council. Detection of cheating: We do check and our tools for doing this are much better than you might think
Policies: Attendance and Grading Presence in lectures: strongly advised (cut warning after 2 unexcused absences) Presence in design/help sessions: voluntary Exams: weighted 15, 15, 15 (final) Labs: completed/not completed Guaranteed: > 90%: A > 80%: B > 70%: C
Introduction: Themes and Concepts • A bird’s eye view of computer system organization • Processors and technological progress • Abstraction vs. Reality • Internal data representations - it’s all just numbers • Why knowing Assembly language is a Good Thing • Computer storage is large but not infinite • Asymptotic complexity is only part of the story • Computers compute, but they also communicate
5 main components of any Computing System Sun Blade Computer Keyboard, Mouse Computer Processor Memory (where programs, data live when running) Devices Disk(where programs, data live when not running) Input Control (“brain”) Datapath (“brawn”) Output Display, Printer
5 main components of Computer Systems • Datapath • Performs operations on signals moving through the CPU • Control Circuitry • Routes signals into, through, and out of the CPU • Memory • Volatile (RAM) • Permanent Storage (hard drives, DVD-ROM, etc.) • Input devices • Mouse, keyboard, etc. • Output devices • Monitor, printers, etc.
March of Progress: Moore’s Law • Moore’s Law: the number of transistors on a single integrated circuit, also called chip density, doubles every 18 months • Applies to any kind of semiconductor chip: memory (RAM), microprocessors, GPUs, etc. • Trend first described by Intel co-founder Gordon Moore in 1965.
The March of Progress: DRAM capacity Size in bits of single-chip Dynamic Random Access Memory (DRAM ) year size (Mbit) 1980 0.0625 1983 0.25 1986 1 1989 4 1992 16 1996 64 1998 128 2000 256 2002 512 • Now 1.4X/yr, or 2X every 2 years. • 8000X since 1980!
The March of Progress: Microprocessor Complexity Itanium 2: 41 Million Athlon (K7): 22 Million Alpha 21264: 15 million PowerPC 620: 6.9 million PentiumPro: 5.5 million(1995) Sparc Ultra: 5.2 million Intel 80486: 1 million (1989) Intel 8088: < 50,000 (1979)
The March of Progress: Processor Performance Intel P4 2000 MHz (Fall 2001) Performance measure 1.54X/yr year
The March of Progress: Dramatic Changes • Memory • DRAM capacity: 64xsize improvement in last decade. • Processor • Speed: 100X performance in last decade. • Disk • Capacity doubled every year since 1997: 250Xin last decade.
The March of Progress: Dramatic Changes What will the state-of-the-art PC be when you graduate? • Processor clock speed: 5000 MegaHertz (5.0 GigaHertz) • Memory capacity: 4000 MegaBytes (4.0 GigaBytes) • Disk capacity: 2000 GigaBytes (2.0 TeraBytes) • Time to learn some new units! • Mega => Giga • Giga => Tera • Tera => Peta • Peta => Exa • Exa => Zetta • Zetta => Yotta = 1024
Computing Systems as Abstractions • Most CS courses emphasize abstraction • Abstract data types (CS 157/257) • Asymptotic analysis (CS 320) • Computer Systems are organized according to layers of abstraction. • In general, abstraction helps engineers of all sorts to manage complexity • Abstraction helps insulate programmers from differences between various hardware platforms
Our Theme: Abstraction Is Important But Don’t Forget Reality • Abstractions have limits • Especially in the presence of bugs • Understanding details of the underlying implementations: sometimes, you just need to • Useful outcomes • Become a more effective programmer! • Find and eliminate bugs efficiently • Understand and tune for program performance • Preparation for later “systems” classes in CS • Operating Systems, Networking
Reality #1: Ints are not Integers, Floats are not Reals • Example 1: Is x2 ≥ 0? • Floats: Yes! • Ints: • 40000 * 40000 --> 1600000000 • 50000 * 50000 --> ?? • Example 2: Is (x + y) + z = x + (y + z)? • Unsigned & Signed Ints: Yes! • Floats: • (1e20 + -1e20) + 3.14 --> 3.14 • 1e20 + (-1e20 + 3.14) --> ??
Code Security Example • Similar to code found in FreeBSD’s implementation of getpeername • There are legions of smart people trying to find vulnerabilities in programs /* Kernel memory region holding user-accessible data */ #define KSIZE 1024 char kbuf[KSIZE]; /* Copy at most maxlen bytes from kernel region to user buffer */ int copy_from_kernel(void *user_dest, int maxlen) { /* Byte count len is minimum of buffer size and maxlen */ int len = KSIZE < maxlen ? KSIZE : maxlen; memcpy(user_dest, kbuf, len); return len; }
Typical Usage /* Kernel memory region holding user-accessible data */ #define KSIZE 1024 char kbuf[KSIZE]; /* Copy at most maxlen bytes from kernel region to user buffer */ int copy_from_kernel(void *user_dest, int maxlen) { /* Byte count len is minimum of buffer size and maxlen */ int len = KSIZE < maxlen ? KSIZE : maxlen; memcpy(user_dest, kbuf, len); return len; } #define MSIZE 528 void getstuff() { char mybuf[MSIZE]; copy_from_kernel(mybuf, MSIZE); printf(“%s\n”, mybuf); }
Malicious Usage /* Kernel memory region holding user-accessible data */ #define KSIZE 1024 char kbuf[KSIZE]; /* Copy at most maxlen bytes from kernel region to user buffer */ int copy_from_kernel(void *user_dest, int maxlen) { /* Byte count len is minimum of buffer size and maxlen */ int len = KSIZE < maxlen ? KSIZE : maxlen; memcpy(user_dest, kbuf, len); return len; } #define MSIZE 528 void getstuff() { char mybuf[MSIZE]; copy_from_kernel(mybuf, -MSIZE); . . . }
Computer Arithmetic • Arithmetic operations have important mathematical properties • Cannot assume all the usual mathematical properties, due to finiteness of representations • Integer operations satisfy “ring” properties • Commutativity, associativity, distributivity • Floating point operations satisfy “ordering” properties • Monotonicity, values of signs • Observation • Need to understand which abstractions apply in which contexts • Important issues for compiler writers and serious application programmers
Great (Grim?) Reality #2: You Need to Know Assembly • Chances are, you’ll never write program in assembly • Compilers are much better & more patient than you are • But understanding assembly is key to understanding the machine-level execution model. • When bugs happen, high-level language model breaks down • Tuning program performance • What optimizations are done/not done by the compiler? • What are the sources of program inefficiency? • Implementing system software • Compiler has machine code as target • Operating systems must manage process state • Creating / fighting malware • x86 assembly is the language of choice!
Assembly Code Example • Time Stamp Counter • Special 64-bit register in Intel-compatible machines • Incremented every clock cycle • Read with rdtsc instruction • Application • Measure time (in clock cycles) required by a procedure: • double t; • start_counter(); • P(); // function to be timed • t = get_counter(); • printf("P required %f clock cycles\n", t);
Code to Read Counter • Add a small amount of assembly code using GCC’s asm facility • This inserts assembly code into machine code generated by compiler: static unsigned cyc_hi = 0; static unsigned cyc_lo = 0; /* Set *hi and *lo to the high and low order bits of the cycle counter. */ void access_counter(unsigned *hi, unsigned *lo) { asm("rdtsc; movl %%edx,%0; movl %%eax,%1" : "=r" (*hi), "=r" (*lo) : : "%edx", "%eax"); }
Grim Reality #3: “Random Access Memory” is a non-physical abstraction • Memory is not unbounded • It must be allocated and managed • Many applications are memory dominated • Bugs due to memory referencing errors are especially pernicious • Effects are distant in both time (error may show up long after erroneous instruction executes) and space (effect may be outside the bounds of the executing program) • Memory performance is not uniform • Cache and virtual memory effects can greatly affect program performance • Adapting program to characteristics of memory system can lead to major speed improvements
Example of a Memory Referencing Bug double fun(int i) { volatile double d[1] = {3.14}; volatile long int a[2]; a[i] = 1073741824; /* Possibly out of bounds */ return d[0]; } fun(0) –> 3.14 fun(1) –> 3.14 fun(2) –> 3.1399998664856 fun(3) –> 2.00000061035156 fun(4) –> 3.14, then segmentation fault
Saved State MSB of d[0] LSB of d[0] a[1] a[0] Memory Referencing Bug Example double fun(int i) { volatile double d[1] = {3.14}; volatile long int a[2]; a[i] = 1073741824; /* Possibly out of bounds */ return d[0]; } fun(0) –> 3.14 fun(1) –> 3.14 fun(2) –> 3.1399998664856 fun(3) –> 2.00000061035156 fun(4) –> 3.14, then segmentation fault Explanation: 4 3 Location accessed by fun(i) 2 1 0
Memory Referencing Errors • Unlike Java, C and C++ do not protect memory at all • Out of bounds array references • Invalid pointer values • Abuses of malloc/free (e.g., memory leaks) • This usually leads to nasty bugs • Whether or not bug has any effect depends on system and compiler • Action at a distance • Corrupted object logically unrelated to one being accessed • Effect of bug may be first observed long after it is generated • How can I deal with this? • Program in Scheme, Java or Python • Understand what possible interactions may occur • Use tools that detect referencing errors
Memory System Performance Example • Hierarchical memory organization • Performance depends on access patterns • Including how we step through multi-dimensional arrays void copyij(int src[2048][2048], int dst[2048][2048]) { int i,j; // access in row-major order for (i = 0; i < 2048; i++) for (j = 0; j < 2048; j++) dst[i][j] = src[i][j]; } void copyji(int src[2048][2048], int dst[2048][2048]) { int i,j; // access in column-major order for (j = 0; j < 2048; j++) for (i = 0; i < 2048; i++) dst[i][j] = src[i][j]; } 21 times sloweron Pentium 4
s1 2k s3 s5 8k s7 32k s9 128k s11 s13 512k 2m s15 8m The Memory Mountain Read throughput (MB/s) Pentium III Xeon 1200 550 MHz 16 KB on-chip L1 d-cache 16 KB on-chip L1 i-cache 1000 L1 512 KB off-chip unified L2 cache 800 600 400 L2 200 0 Mem Working set size (bytes) Stride (words)
Reality #4: There’s more to system performance than asymptotic complexity • Constant factors matter too! • And even exact op count does not predict performance • Easily see 10:1 performance range depending on how code was written • Must optimize at multiple levels: algorithm, data representations, procedures, and loops • Must understand system to optimize performance • How are programs compiled and executed • How do we measure program performance and identify bottlenecks • How do we improve performance without destroying code modularity and generality
Example: Matrix Multiplication • Standard desktop computer, vendor compiler, using optimization flags • Both implementations have exactly the same operations count (2n3) • What is going on? 160x Best code (K. Goto) Triple loop
MMM Plot: Analysis Multiple threads: 4x Vector instructions: 4x Memory hierarchy and other optimizations: 20x • Each speedup due to taking advantage of increasingly complex system resources • Effect: less register spills, less L1/L2 cache misses, less TLB misses
Reality #5: Computers do more than just execute programs • They need to get data in and out • I/O system is critical to program reliability and performance • They communicate with each other over networks • Many system-level issues arise in presence of network • Concurrent operations by autonomous processes • Coping with unreliable media • Cross platform compatibility • Complex performance issues
Slide Acknowledgements Slides to accompany textbook due to Bryant and O’Hallaron Technology Trends slides due to Dr. XXX Carle of UC/Berkeley Some graphs from Hennesey and Patterson: Computer Organization and Design, 4th Ed.