1 / 46

2014-4-29 John Lazzaro (not a prof - “John” is always OK)

www-inst.eecs.berkeley.edu/~cs152/. CS 152 Computer Architecture and Engineering. Lecture 26 -- Midterm II Review Session. 2014-4-29 John Lazzaro (not a prof - “John” is always OK). TA: Eric Love. Play:. Today - Midterm II Review Session. Study Tips. HW 2, problem by problem

tirza
Download Presentation

2014-4-29 John Lazzaro (not a prof - “John” is always OK)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. www-inst.eecs.berkeley.edu/~cs152/ CS 152 Computer Architecture and Engineering Lecture 26 -- Midterm II Review Session 2014-4-29 John Lazzaro (not a prof - “John” is always OK) TA: Eric Love Play:

  2. Today - Midterm II Review Session Study Tips HW 2, problem by problem (if there is time) HKN

  3. CS152 Midterm II May 1st, 2014 # Points Name: SSID: “All the work is my own. I have no prior knowledge of the exam contents, aside from guidance from class staff. I will not share the contents with others in CS152 who have not taken it yet.” Signature: Please write clearly, and put your name on each page. Please abide by word limits. Good luck! Eric Love John Lazzaro

  4. What does it cover? Lectures 9 onward Focus will be on problems that require you to do a task (write a small program, trace through execution ,etc) that demonstrates that you understand a concept. [...] No transistor-level questions (DRAM and SRAM cells, etc) Time for a quick walk-through ...

  5. www-inst.eecs.berkeley.edu/~cs152/ CS 152 Computer Architecture and Engineering Lecture 9 -- Memory 2014-2-18 John Lazzaro (not a prof - “John” is always OK) TA: Eric Love

  6. Latency is not the same as bandwidth! Thus, push to faster DRAM interfaces What if we want all of the 16384 bits? In row access time (55 ns) we can do 22 transfers at 400 MT/s. 16-bit chip bus -> 22 x 16 = 352 bits << 16384 1 of 8192 decoder 13-bit row address input Now the row access time looks fast! 16384 columns 8192 rows 134 217 728 usable bits (tester found good bits in bigger array) 16384 bits delivered by sense amps Select requested bits, send off the chip

  7. www-inst.eecs.berkeley.edu/~cs152/ CS 152 Computer Architecture and Engineering Lecture 10 -- Cache I 2014-2-20 John Lazzaro (not a prof - “John” is always OK) TA: Eric Love

  8. Latency: A closer look Read latency: Time to return first byte of a random access Architect’s latency toolkit: (1) Parallelism. Request data from N 1-bit-wide memories at the same time. Overlaps latency cost for all N bits. Provides N times the bandwidth. Requests to N memory banks (interleaving) have potential of N times the bandwidth. (2) Pipeline memory. If memory has N cycles of latency, issue a request each cycle, receive it N cycles later.

  9. www-inst.eecs.berkeley.edu/~cs152/ CS 152 Computer Architecture and Engineering Lecture 11 -- Cache II 2014-2-25 John Lazzaro (not a prof - “John” is always OK) TA: Eric Love

  10. Issue #4: When to write to lower level ... Related issue: Do writes to blocks not in the cache get put in the cache (”write-allocate”) or not?

  11. www-inst.eecs.berkeley.edu/~cs152/ CS 152 Computer Architecture and Engineering Lecture 12 -- Virtual Memory 2014-2-27 John Lazzaro (not a prof - “John” is always OK) TA: Eric Love

  12. virtual address physical address frame 2 page page page off off Physical frame address 2 2 TLB caches page table entries. 0 5 0 1 3 Page Table V=0 pages either reside on disk or have not yet been allocated. OS handles V=0 “Page fault” frame TLB The TLB caches page table entries In this example, physical and virtual pages must be the same size! for ASID MIPS handles TLB misses in software (random replacement). Other machines use hardware.

  13. www-inst.eecs.berkeley.edu/~cs152/ CS 152 Computer Architecture and Engineering Lecture 13 - Synchronization 2014-3-4 John Lazzaro (not a prof - “John” is always OK) TA: Eric Love

  14. If R3 != R6, another thread got here first, so we must try again. Non-blocking consumer synchronization Another atomic read-modify-write instruction: Compare&Swap(Rt,Rs, m) if (Rt == M[m]) then M[m] = Rs; Rs = Rt; /* do swap */ else /* do not swap */ Assuming sequential consistency: MEMBARs not shown ... try: LW R3, head(R0) ; Load queue head into R3 spin: LW R4, tail(R0) ; Load queue tail into R4 BEQ R4, R3, spin ; If queue empty, wait LW R5, 0(R3) ; Read x from queue into R5 ADDI R6, R3, 4 ; Shift head by one word Compare&Swap R3, R6, head(R0); Try to update head BNE R3, R6, try ; If not success, try again If thread swaps out before Compare&Swap, no latency problem; this code only “holds” the lock for one instruction!

  15. www-inst.eecs.berkeley.edu/~cs152/ CS 152 Computer Architecture and Engineering Lecture 14 - Cache Design and Coherence 2014-3-6 John Lazzaro (not a prof - “John” is always OK) TA: Eric Love

  16. Writes from 10,000 feet ... for write-thru L1 1. Writing CPU takes control of bus. For write-thru caches ... CPU1 CPU0 2. Address to be written is invalidated in all other caches. Memory bus Reads will no longer hit in cache and get stale data. 3. Write is sent to main memory. To a first-order, reads will “just work” if write-thru caches implement this policy. Reads will cache miss, retrieve new value from main memory A “two-state” protocol (cache lines are “valid” or “invalid”).

  17. www-inst.eecs.berkeley.edu/~cs152/ CS 152 Computer Architecture and Engineering Lecture 15 -- Advanced CPUs 2014-3-11 John Lazzaro (not a prof - “John” is always OK) TA: Eric Love

  18. Split pipelines: a write-after-write hazard. Solution: SUB detects R1 clash in decode stage and stalls, via a pipe-write scoreboard. WAW Hazard DIV R1, R2, R3 SUB R1, R2, R3 If long latency DIV and short latency SUB are sent to parallel pipes, SUB may finish first. The pipeline splits after the RF stage, feeding functional units with different latencies.

  19. RegFile 64 rs1 R rd1 Data rs2 Instr Mem Instruction Issue Logic ws1 rd2 wd1 Addr rs3 rd3 32 rs4 PC and Sequencer ws2 rd4 R wd2 WE1 WE2 IF (Fetch) ID (Decode) EX (ALU) MEM WB Superscalar R machine IR IR IR IR A IR Y B A Y Y B M IR IR IR IR IF (Fetch) ID (Decode) EX (ALU) MEM WB

  20. www-inst.eecs.berkeley.edu/~cs152/ CS 152 Computer Architecture and Engineering Lecture 17 -- Networks, Routers, Google 2014-3-20 John Lazzaro (not a prof - “John” is always OK) TA: Eric Love

  21. 6 key parameters scale across dimension of “by one server”, “by 80-server rack” and “by array” To get more DRAM and disk capacity, you must work on a scale larger than a single server. But as you do, latency and bandwidth degrade, because network performance << a server bus, and because array network is under-provisioned. Exception: disk latency is roughly scale-independent. you must work on a scale larger than a single server.

  22. www-inst.eecs.berkeley.edu/~cs152/ CS 152 Computer Architecture and Engineering Lecture 18 -- Dynamic Scheduling I Thanks to Krste Asanovic ... 2014-4-1 John Lazzaro (not a prof - “John” is always OK) TA: Eric Love

  23. ADDI R1,R0,64 R1→ PR01 F0→ PF00 What was gained? An instruction may execute once all of its source registers have been written. F4,0(R1) Given an endless supply of registers ... Rename “architected registers” (Ri, Fi) to new “physical registers” (PRi, PFi) on each write. ADDI PR01,PR00,64 LD PF00 0(PR01) ADDD PF04, PF00, PF02 SD PF04, 0(PR01) SUBI PR11, PR01, 8 BEQZ PR11 ENDLOOP ITER2: LD PF10 0(PR11) ADDD PF14, PF10, PF02 SD PF14, 0(PR11) SUBI PR21, PR11, 8 BEQZ PR21 ENDLOOP ITER3: LD PF20 O(PR21) [...]

  24. www-inst.eecs.berkeley.edu/~cs152/ CS 152 Computer Architecture and Engineering Lecture 19 -- Dynamic Scheduling II 2014-4-3 John Lazzaro (not a prof - “John” is always OK) TA: Eric Love

  25. Rename stage close-up: For mis-speculation recovery Time-stamped. (1) Allocates new physical registers for destinations, (2) Looks up physical register numbers for sources, (3) Handle rename dependences within the 4 issuing instructionsin one clock cycle! Output: 12 physical registers numbers: 1 destination and 2 sources for the 4 instructions to be issued. Input: 4 instructions specifying architected registers.

  26. www-inst.eecs.berkeley.edu/~cs152/ CS 152 Computer Architecture and Engineering Lecture 20 -- Dynamic Scheduling III 2014-4-8 John Lazzaro (not a prof - “John” is always OK) TA: Eric Love

  27. Instruction traces of IA-32 programs show most executed instructions require 4 or fewer micro-ops. Translation for these ops are cast into logic gates, often over several pipeline cycles. Micro-op translation example ... ADC m32, r32: // for a simple m32 address mode Becomes: LD T1 0(EBX); // EBX register point to m32 ADD T1, T1, CF; // CF is carry flag from EFLAGS ADD T1, T1, r32; // Add the specified register ST 0(EBX) T1; // Store result back to m32

  28. www-inst.eecs.berkeley.edu/~cs152/ CS 152 Computer Architecture and Engineering Lecture 21 -- Dataflow 2014-4-10 John Lazzaro (not a prof - “John” is always OK) TA: Eric Love

  29. Dataflow stages of 21264 Idea: Write dataflow programs that reference physical registers, to execute on this machine. Input: Instructions that reference physical registers. Scoreboard: Tracks writes to physical registers.

  30. www-inst.eecs.berkeley.edu/~cs152/ CS 152 Computer Architecture and Engineering Lecture 22 -- GPU + SIMD + Vectors I 2014-4-15 John Lazzaro (not a prof - “John” is always OK) TA: Eric Love

  31. Or, part of a math opcode. Pure data move opcode.

  32. www-inst.eecs.berkeley.edu/~cs152/ CS 152 Computer Architecture and Engineering Lecture 23 -- GPU + SIMD + Vectors II 2014-4-17 John Lazzaro (not a prof - “John” is always OK) TA: Eric Love

  33. Assume MacBook Air ... 1386 x 768 screen ... We are all zoomed in on Google Maps Top pyramid image is 4K x 4K ... Idea: Keep only a 1386 x 768 window of top images in RAM ... Lets us cache a 1024 x 1024 window of the 11 PB Earth map in 34.7 MB!

  34. Zoom all the way in ... units of pixels Bottom stack image shows the smallest part of the 1 mile sq. patch of the Earth of any stack image. Hardware interpolation of stack levels. units of sq. miles units of miles Graphics hardware displays bottom stack image, which fills MacBook Air display.

  35. www-inst.eecs.berkeley.edu/~cs152/ CS 152 Computer Architecture and Engineering Lecture 24 -- Voxel Processing 2014-4-22 John Lazzaro (not a prof - “John” is always OK) TA: Eric Love

  36. After processing ... Interesting to computer architects because n^3 grows so quickly! A 3-D matrix of cubes, in objectspace (X,Y,Z). 8-bit density value stored for each cube (0 = “air”). 256^3 = 16MB = 10 inch cube (for 1mm voxels) 0.125 mm voxels? 8 GB

  37. www-inst.eecs.berkeley.edu/~cs152/ CS 152 Computer Architecture and Engineering Lecture 25 -- Digital Imaging 2014-4-24 John Lazzaro (not a prof - “John” is always OK) TA: Eric Love

  38. Serial port to control the camera. Simple Power Hookup 8-bit Dout Port 54 MHz Clk 1280 x 1024 @ 15 fps 640 x 512 @ 30 fps YCrCb 4:2:2 Camera interface to the outside world

  39. AWARE-2: Array of 98 phone camera modules (14 M-pixel) 1.3 G-pixel camera @ 3 frames/sec

  40. On Thursday Mid-term II ... Ground rules ...

  41. Mid-term: How to do well ... Problem intro often features a lecture slide. If you have to teach yourself that slide during the test, you’re starting out behind. Getting the problem correct requires thinking on your feet to do a new design or analyze one given to you. There will not be “you can onlygetit if do the reading” problems ... but the reading helps you understand how to think through the problem.

  42. Mid-term: There may be math ... No memorization: If we ask about Amdahl’s Law, we will show its definition lecture slide. Understanding is needed: A problem may require you to apply equation to a design, etc. Cannot use electronic devices ... more administrative info after we do some content. You may need to do: simple algebra and calculus, add a few numbers by hand, etc.

  43. When is it? Where is it? Ground rules. 9:30 AM sharp, Tuesday May 1st, 306 Soda. Every-other-seat seating, except for the front rows, where every-seat is permitted. No blue-books needed. We will be handing out a paper test. Pencil is preferred. Pencils down @ 10:55 AM, so we can collect papers before next class comes in.

  44. When is it? Where is it? Ground rules. No use of calculators, smartphones, laptops, etc ... during the exam. Closed-book, closed-notes. Just pencils, erasers. No consulting with students. Restroom breaks are OK, but you’ll still need to hand in your exam @ 10:55. Questions are reserved for serious concerns about a bug in the question.

  45. Today - Midterm II Review Session Study Tips HW 2, problem by problem (if there is time) HKN

  46. On Thursday Mid-term II ... See you there !

More Related