500 likes | 704 Views
Computer Architecture CS 154 Where software and hardware finally meet. Dr. Franklin. What is Computer Architecture?. Program software. Architecture. Write compilers. Design assembly language. Design processor. Optimize layout, circuits, etc. Design transistor technology.
E N D
Computer Architecture CS 154Where software and hardware finally meet Dr. Franklin
What is Computer Architecture? Program software Architecture Write compilers Design assembly language Design processor Optimize layout, circuits, etc Design transistor technology
What is Computer Architecture? Program software Architecture Write compilers This class!! Design assembly language Design processor Optimize layout, circuits, etc Design transistor technology
Coming together – the basics • What do high-level instructions get compiled down to? • How do you build a basic machine?
Coming together – the basics • What do high-level instructions get compiled down to? • How do you build a basic machine? Dispelling the Magic
Hardware optimization • What do high-level instructions get compiled down to? • How do you build a basic machine? • How do architects specialize the hardware to run programs quickly?
Hardware optimization • What do high-level instructions get compiled down to? • How do you build a basic machine? • How do architects specialize the hardware to run programs quickly? Dispelling the Magic Exploiting the Nature of Programs
Software optimization • What do high-level instructions get compiled down to? • How do you build a basic machine? • How do architects specialize the hardware to run programs quickly? • How do programmers optimize programs to run quickly?
Software optimization • What do high-level instructions get compiled down to? • How do you build a basic machine? • How do architects specialize the hardware to run programs quickly? • How do programmers optimize programs to run quickly? Dispelling the Magic Exploiting the Nature of Programs Exploiting the Hardware
CS 154 Topics • How do you build a basic machine? • How do architects specialize the hardware to run programs quickly? • How do programmers optimize programs to run quickly?
Architecture • Must understand software • Programs have certain characteristics • Optimize design to take advantage of char. • Must understand hardware • Hardware design complexity • Ease of programming • Performance • Power
Technology Trends: Memory Capacity (Single-Chip DRAM) • Now 1.4X/yr, or 2X every 2 years. • 8000X since 1980!
Technology Trends: Microprocessor Complexity Itanium 2: 41 Million Athlon (K7): 22 Million Alpha 21264: 15 million PentiumPro: 5.5 million PowerPC 620: 6.9 million Alpha 21164: 9.3 million Sparc Ultra: 5.2 million Moore’s Law Moore’s Law 2X transistors/chip Every 1.5 years
Technology Trends: Processor Performance Intel P4 2000 MHz (Fall 2001) 1.5X/yr Performance measure year This curve has now flattened out - that is why we are seeing multicore
Technology Trends Summary Technology trend 2X every 2.0 years in memory size; every 1.0 year in disk capacity; every 1.5 years in processor complexity (Moore’s Law) More processors per chip each generation
The Architecture Walls Memory Wall ILP Wall Power Wall
The Architecture Walls Memory Wall – Processor speed kept increasing, memory did not as quickly, so processor is often idle waiting for memory ILP Wall – There are not enough independent instructions for the processor to get real work done when one instruction needs to wait for another (or memory or whatever) Power Wall – Solving the above two walls requires too much power, and we don’t have cooling technology to dissipate that much heat.
Beginning of the multi-core era • Multi-core chips • Place multiple processors on a single die • Because • They can communicate very quickly • Much higher potential throughput • Less power per area than accelerating single thread • But • You need parallel programs (or multiple programs) to exploit
The next frontier • GPU – Graphics processing unit • Specialized hardware for graphics • Optimized to run the same thing on many pieces of data (i.e. pixels) • Why? • They are mature technology, driven by gaming • Low power parallel processing • Barrier • Limited programming model • Not appropriate for a lot of programs (i.e. servers)
Performance • Not an absolute • Depends on application characteristics • Graphics • General-Purpose desktop • Scientific apps • Servers • Rapidly changing technology • DRAM speed, chip density, etc. • This is the focus of our class
What is Computer Architecture? Program software Why do I care?!? I’m 3 levels above. Architecture Write compilers This class!! Design assembly language Design processor Optimize layout, circuits, etc Design transistor technology
But I’m CS • Why do I have to learn about hardware? (I hear you ask)
But I’m CS • Why do I have to learn about hardware? (I hear you ask) • Hardware is optimized to take advantage of particular programcharacteristics
But I’m CS • Why do I have to learn about hardware? (I hear you ask) • Hardware is optimized to take advantage of particular programcharacteristics • If your software is different, it can get atrocious performance
But I’m CS • Why do I have to learn about hardware? (I hear you ask) • Hardware is optimized to take advantage of particular programcharacteristics • If your software is different, it can get atrocious performance • You must understand general architecture to program for it.
But I’m CS • Why do I have to learn about hardware? (I hear you ask) • Hardware is optimized to take advantage of particular programcharacteristics • If your software is different, it can get atrocious performance • You must understand general architecture to program for it. • In an ideal world, compilers would do this for you. (We live in the real world)
R1 = A[5]; B[6] = R1 R3 = R0 + R2 R5 = R4 – R3 R7 = R0 + R6 C[7] = R7 R1 = A[5]; R3 = R0 + R2 R7 = R0 + R6 B[6] = R1 R5 = R4 – R3 C[7] = R7 Which is faster?
Which is faster in C/Java? for(i=0;i<n;i++) for(j=0;j<n;j++) A[j][i] = i*j+7; for(i=0;i<n;i++) for(j=0;j<n;j++) A[i][j] = i*j+7;
What data structure should I use? • Array or linked structure? • Does it change often? • Does it get searched often?
What data structure should I use? • Array or linked structure? • Does it change often? • yes – then linked nodes • Does it get searched often?
What data structure should I use? • Array or linked structure? • Does it change often? • yes – then linked nodes • Does it get searched often? • yes – then array
General Class Info • When, where and who • Website: http://www.cs.ucsb.edu/~franklin/154/154.html • Professor: Diana Franklin, franklin@cs • TA: Michael, Nadav, Shivapriya • Office Hours: • Franklin: MTWR, 3:30-4:30, – HFH 1115 • TA:
Grading Policy • Grading • Labs: 0-5% (0.5% for each attended) • Projects: 25-30% • Quizzes: 10% • Midterms: 25% • Final: 35% • Plagiarism • You may discuss the design of programming assignments • You may not show or look at any other group’s code • Come to office hours!!! • Look at example code from class!!! • Plagiarism will result in an F in the class and reporting to Judicial Affairs for further action.
Curve Individual tests and assignments are not curved Curving only occurs at the end to offset grading that is too harsh
Projects • 2 or 3 students per group • Discussions focus on skills for project • Projects build on each other • Don’t get behind – you have fair warning • The expectation is that everyone completes all projects properly (as opposed to in the past, where you could get one bad grade and have others not depend on it)
Discussion group • Piazza • join this week • Announcements will be made here • Do not post code or partial solutions EVER, even to ask for help as to what is wrong • Post those privately!
Exams • 2 MiniExams – 1 side of 1 page notes • 2 Midterms – 2 sides of 1 page notes • 1 Final – 2 sides of 2 pages of notes • if your weighted average on exams < 60% (straight scale), and is well below the class average, you may receive an F
Learning a new ISALearn the syntax, semantics of: • Arithmetic operations • Control operations • Memory operations
High-Level MIPS • Arithmetic: All computation occurs in registers • Branches: Two-step process – calculate then branch • Memory: Move data between registers (for computation) and memory (huge)
MIPS Registers – 32 registers Page 140, Figure 3.13
Arithmetic “R-Format” • Two input registers – rs & rt • One output register - rd
Arithmetic “I-format” • One input register – • One hard-coded constant - • One output register -
Branches • goto loop • if (i < 100) goto loop
Load/Store Instructions • Displacement addressing mode • Register indirect is Displacement with 0 offset • lw = load word (4 bytes)
Let’s do a code example int sum = 0; for(i=0;i<n;i++) sum += A[i]; • Split apart the parts of the for loop • Translate the regular code • Insert branches • Translate memory operations
int sum = 0; for(i=0;i<n;i++) sum += A[i]; • int sum = 0; • i = 0; • if !(i < n) -> skip loop • sum += A[i] • i++ • if (i < n) -> loop again
int sum = 0; • i = 0; • if !(i < n) -> skip loop • sum += A[i] • i++ • if (i < n) -> loop again • $t0 -> sum, $t1 -> i • assume &A[0] is in $a0, n is in $a1 • addi $t0, $0, 0 • add $t1, $0, $0 • slt $t2, $t1, $a1 • beq $t2, $0, skiploop • loop: sll $t2, $t1, 2 • add $t3, $t2, $a0 • lw $t2, 0 ($t3) • add $t0, $t0, $t2 • addi $t1, $t1, 1 • slt $t2, $t1, $a1 • bne $t2, $0, loop • skiploop:
sum += A[i] • load A[i] • add it to sum • sll $t2, $t1, 2 • add $t3, $t2, $a0 • lw $t2, 0 ($t3)