Computer Architecture

1. 4/21/2012 Computer Architecture 1 Computer Architecture Davies Muche & Mike Li Luo CS521 Spring 2003

2. 4/21/2012 Computer Architecture 2 What is a digital computer ? A digital computer is a machine composed of the following three basic components - Input/Output - Central Processing Unit (CPU) - Memory

3. 4/21/2012 Computer Architecture 3 Early Computers As early as the 1600s Calculating machines which could do Arithmetic operations had been made, but, non had the three basic components of a digital computer In 1823, Charles Babbage undertook the design of the Difference Engine The machine was to solve 6th Degree polynomials to 20 digit accuracy

4. 4/21/2012 Computer Architecture 4 the concepts of mechanical control and mechanical calculation put together into a machine that has the basic parts of a digital computer He was given 17,000 Pounds to construct the machine but, the project was abandoned in 1842 (uncompleted) 1856, Babbage conceived the idea of the Analytical Machine (After his death his son Henry tried to build it but never succeeded) In 1854, George Scheutz, built a working Difference machine based on Babbage�s design. (This machine printed mathematical, astronomical and actuarial tables with unprecedented accuracy, and was used by the British and American governments)

5. 4/21/2012 Computer Architecture 5

6. 4/21/2012 Computer Architecture 6 However, in 1834, Charles Babbage, developed the hypothetical program to solve simultaneous equations on the Analytical Machine

7. 4/21/2012 Computer Architecture 7 The John von Neumann Architecture consists of five major components (1940s)

8. 4/21/2012 Computer Architecture 8 A refinement of the von Neumann model, the system bus model has a CPU (ALU and control), memory, and an input/output unit

9. 4/21/2012 Computer Architecture 9

10. 4/21/2012 Computer Architecture 10 The CPU CPU (central processing unit) is an older term for processor and microprocessor, the central unit in a computer containing the logic circuitry that performs the instructions of a computer's programs.� NOTABLE TYPES - RISC: Reduced Instruction Set Computer -Introduced in the mid 1980s -Requires few transistors -capable of executing only a very limited set of instructions - CISC: Complex Instruction Set Computer -complex CPUs that had ever-larger sets of instructions

11. 4/21/2012 Computer Architecture 11 RISC or CISC �The great Controversy� RISC proponents argue that RISC machines are both cheaper and faster, and are therefore the machines of the future. Skeptics note that by making the hardware simpler, RISC architectures put a greater burden on the software. They argue that this is not worth the trouble because conventional microprocessors are becoming increasingly fast and cheap anyway. The TRUTH! CISC and RISC implementations are becoming more and more alike. Many of today's RISC chips support as many instructions as yesterday's CISC chips. And today's CISC chips use many techniques formerly associated with RISC chips.

12. 4/21/2012 Computer Architecture 12

13. 4/21/2012 Computer Architecture 13 What you need to Know about a CPU Processing speed - The clock Frequency is one measure of how fast a computer is ( however, the length of time to carry out an operation depends not only on how fast the processor cycles, but how many cycles are required to perform a given operation. Voltage requirement Transistors (electronic switches) in the CPU requires some voltage to trigger them. - In the pre-486DX66 days, everything was 5 volts - As chips got faster and power became a concern, designers dropped the chip voltage down to 3.3 volts (external Voltage) and 2.9V or 2.5V core voltage

14. 4/21/2012 Computer Architecture 14 More on Voltage Requirements� Power consumption equates largely with heat generation, which is a primary enemy in achieving increased performance. Newer processors are larger and faster, and keeping them cool can be a major concern. Reducing power usage is a primary objective for the designers of notebook computers, since they run on batteries with a limited life. (They also are more sensitive to heat problems since their components are crammed into such a small space). Compensate for by using lower-power semiconductor processes, and shrinking the circuit size and die size. Newer processors reduce voltage levels even more by using what is called a dual voltage, or split rail design

15. 4/21/2012 Computer Architecture 15 More on Dual Voltage Design � A split rail processor uses two different voltages. The external or I/O voltage is higher, typically 3.3V for compatibility with the other chips on the motherboard. The internal or core voltage is lower: usually 2.5 to 2.9 volts. This design allows these lower-voltage CPUs to be used without requiring wholesale changes to motherboards, chipsets etc.

16. 4/21/2012 Computer Architecture 16 Power consumption verses speed of some processors

17. 4/21/2012 Computer Architecture 17 MEMORY Computers have hierarchies of memories that may be classified according to Function, Capacity and Response Times. -Function "Reads" transfer information from the memory; "Writes" transfer information to the memory: -Random Access Memory (RAM) performs both reads and writes. -Read-Only Memory (ROM) contains information stored at the time of manufacture that can only be read. -Programmable Read-Only Memory (PROM) is ROM that can be written once at some point after manufacture. -Capacity bit = smallest unit of memory (value of 0 or 1); byte = 8 bits; In modern computers, the total memory may range from say 16 MB in a small personal computer to several GB (gigabytes) in large supercomputers.

18. 4/21/2012 Computer Architecture 18 More on memory � Memory Response Memory response is characterized by two different measures: -Access Time (also termed response time or latency) defines how quickly the memory can respond to a read or write request. -Memory Cycle Time refers to the minimum period between two successive requests of the memory. -Access times vary from about 80 ns [ns = nanosecond = 10^(-9) seconds] for chips in small personal computers to about 10 ns or less for the fastest chips in caches and buffers. For various reasons, the memory cycle time is more than the speed of the memory chips (i.e., the length of time between successive requests is more than the 80 ns speed of the chips in a small personal computer).

19. 4/21/2012 Computer Architecture 19

20. 4/21/2012 Computer Architecture 20 The I/O BUS A Computer transfers data from disk to CPU, from CPU to memory, or from memory to the display adapter etc. To avoid having a separate circuits between every pair of devices, the BUS is used. Definition: The Bus is simply a common set of wires that connect all the computer devices and chips together

21. 4/21/2012 Computer Architecture 21 Different functions for Different wires of the bus Some of these wires are used to transmit data. Some send housekeeping signals, like the clock pulse. Some transmit a number (the "address") that identifies a particular device or memory location Use of the address The computer chips and devices watch the address wires and respond when their identifying number (address) is transmitted before they can transfer data Problem! Starting with machines that used the 386 CPU, CPUs and memory ran faster than other I/O devices Solution - Separate the CPU and memory from all the I/O. Today, memory is only added by plugging it into special sockets on the main computer board.

22. 4/21/2012 Computer Architecture 22 Bus Speeds Multiple Buses with different speeds is an option or a single bus supporting different speeds is used In a modern PC, there may be a half dozen different Bus areas. There is certainly a "CPU area" that still contains the CPU, memory, and basic control logic. There is a "High Speed I/O Device" area that is either a VESA Local Bus (VLB) or an PCI Bus

23. 4/21/2012 Computer Architecture 23 Some Bus Standards ISA (Industry Standard Architecture) bus In 1987 IBM introduced a new Microchannel (MCA) bus The other vendors developed an extension of the older ISA interface called EISA VESA Local Bus (VLB), which became popular at the start of 1993

24. 4/21/2012 Computer Architecture 24 More Bus Standards � The PCI bus was developed by Intel PCI is a 64 bit interface in a 32 bit package The PCI bus runs at 33 MHz and can transfer 32 bits of data (four bytes) every clock tick. That sounds like a 32-bit bus! However, a clock tick at 33 MHz is 30 nanoseconds, and memory only has a speed of 70 nanoseconds. When the CPU fetches data from RAM, it has to wait at least three clock ticks for the data. By transferring data every clock tick, the PCI bus can deliver the same throughput on a 32 bit interface that other parts of the machine deliver through a 64 bit path.

25. 4/21/2012 Computer Architecture 25 Things to know about I/O Bus Buses transfer information between parts of a computer. Smaller computers have a single bus; more advanced computers have complex interconnection strategies. Things to know about the bus Transaction = Unit of communication on bus. Bus Master = The module controlling the bus at a particular time. Arbitration Protocol = Set of signals exchanged to decide which of two competing modules will control a bus at a particular time. Communication Protocol = Algorithm used to transfer data on the bus. Asynchronous Protocol = Communication algorithm that can begin at any time; requires overhead to notify receivers that transfer is about to begin.

26. 4/21/2012 Computer Architecture 26 Things to know about the bus continued � Synchronous Protocol = Communication algorithm that can begin only at well-know times defined by a global clock. Transfer Time = Time for data to be transferred over the bus in single transaction. Bandwidth = Data transfer capacity of bus; usually expressed in bits per second (bps). Sometimes termed throughput. Bandwidth and Transfer Time measure related things, but bandwidth takes into account required overheads and is usually a more useful measure of the speed of the bus.

27. 4/21/2012 Computer Architecture 27 Supercomputer Architecture Background Architecture Approaches Trends Challenges

28. 4/21/2012 Computer Architecture 28 What is parallel computing Use of multiple computers or processors working together to do a common task. Each processor works on its section of the problem Processors are allow to exchange information with other processors

29. 4/21/2012 Computer Architecture 29 Why parallel computing Limits of single computer Available memory Performance Parallel computing allows Solve problems that don�t fit on a single computer Solve problems that can�t be solve in the reasonable time

30. 4/21/2012 Computer Architecture 30 First Supercomputer 1976, first supercomputer, the Cray-1 It had a speed of tens of megaflops (one megaflop equals a million floating-point operations per second) and a memory capacity of 4 megabytes. Contribution from Los Alamos Lab, and Seymour Cray Less than the average speed of PC today

31. 4/21/2012 Computer Architecture 31 Growing Speed The performance of the fastest computers has grown exponentially from 1945 to the present, averaging a factor of 10 every five years Tens of floating-point operations per second, the parallel computers of the mid-1990s achieve tens of billions of operations per second

32. 4/21/2012 Computer Architecture 32 Pipeline Pipeline: start performing an operation on one piece of data while finishing the same operation on another piece of data An operation consists of multiple stages. After a set of operands complete a particular stage, they move into the next stage. Then, another set of operands can move into the stage that was just abandoned. It is like doing three loads of laundry on one washer and dryer, while you are washing you can do drying at the same time�.It is like doing three loads of laundry on one washer and dryer, while you are washing you can do drying at the same time�.

33. 4/21/2012 Computer Architecture 33 SuperPipeline Superpipeline: perform multiple pipelined operations at the same time So, a superpipeline is a collection of multiple pipelines that can operate simultaneously. In other words, several different operations can execute simultaneously, and each of these operations can be broken into stages, each of which is filled all the time. So you can get multiple operations per CPU cycle. For example, a IBM Power4 can have over 200 different operations �in flight� at the same time.

34. 4/21/2012 Computer Architecture 34 Sample of superpipeline design

35. 4/21/2012 Computer Architecture 35 Drawbacks for pipeline architecture---Pipeline Hazards structural hazards: attempt to use the same resource two different ways at the same time e.g., multiple memory accesses, multiple register writes solutions: multiple memories, stretch pipeline control hazards: attempt to make a decision before condition is evaluated e.g., any conditional branch solutions: prediction, delayed branch data hazards: attempt to use item before it is ready solutions: forwarding/bypassing

36. 4/21/2012 Computer Architecture 36 Memory shared memory system, there is one large virtual memory, and all processors have equal access to data and instructions in this memory.

37. 4/21/2012 Computer Architecture 37 Memory cont� distributed memory, in which each processor has a local memory that is not accessible from any other processor.

38. 4/21/2012 Computer Architecture 38 Difference of two kind f memories Software issue not hardware The difference determines how different parts of a parallel program will communicate. shared memory with semaphores, etc. or distributed memory with message passing. All problems run efficiently on a distributed memory BUT software is easier to develop

39. 4/21/2012 Computer Architecture 39 Cache Coherency

40. 4/21/2012 Computer Architecture 40 Styles of parallel computing (Hardware Architecture) SISD-single instruction stream, single data stream SIMD-single instruction stream, multiple data streams MISD-multiple instruction streams, single data stream MIMD-multiple instruction streams, multiple data streams

41. 4/21/2012 Computer Architecture 41 SISD Single Instruction, Single Data Traditional single processor machineTraditional single processor machine

42. 4/21/2012 Computer Architecture 42 SIMD Single Instruction, Multiple Data Single operation acts on a number of variables Single operation acts on a number of variables

43. 4/21/2012 Computer Architecture 43 MISD Multiple Instruction, Single Data shared memory machines, SMP. Multiple processors accessing same memory shapeshared memory machines, SMP. Multiple processors accessing same memory shape

44. 4/21/2012 Computer Architecture 44 MIMD Multiple Instruction, Multiple Data(simplest: program controlled message passing) distributed memory machines., Beowulf clusters, Network of Workstations (NoWs). Multiple programs (often same program, but can be at different stages of execution) running on different machines, with own memory space. Linked by high speed (e.g. gigaswitch), or medium(e.g. 100Mb switch) network. Typically requires explicit message passing calls. distributed memory machines., Beowulf clusters, Network of Workstations (NoWs). Multiple programs (often same program, but can be at different stages of execution) running on different machines, with own memory space. Linked by high speed (e.g. gigaswitch), or medium(e.g. 100Mb switch) network. Typically requires explicit message passing calls.

45. 4/21/2012 Computer Architecture 45 Two parallel processing approaches SMP: symmetric multiprocessing SMP is the processing of programs by multiple processors that share a common operating system and memory MPP: massively parallel processing MPP is the coordinated processing of a program by multiple processors that work on different parts of the program, with each processor using its own operating system and memory

46. 4/21/2012 Computer Architecture 46 Current Trend OpenMP:OpenMP is an open standard for providing parallelization mechanisms on shared-memory multiprocessors. C/C++ and FORTRAN, several of the most commonly used languages for writing parallel programs. based on a thread paradigm

47. 4/21/2012 Computer Architecture 47 OpenMP execution model

48. 4/21/2012 Computer Architecture 48 New Trend Clustering The Widest Definition: Any number of computers communicating at any distance The Common Definition: A relatively small number of computers (<1000) communicating at a relatively small distance (within the same room) and used asa single, shared computing resource

49. 4/21/2012 Computer Architecture 49 Comparison Programming A Program written for Cluster Parallelism can run on an SMP right away A Program written for an SMP can NOT run on a Cluster right away Scalability Clusters are Scalable SMPs are NOT Scalable above a Small Number of Processors

50. 4/21/2012 Computer Architecture 50 Comparison cont.. One big advantage of SMPs is the Single System Image Easier Administration and Support But, Single Point of Failure Cluster computing can be used for load balancing as well as for high availability Load balancing is dividing the amount of work that a computer has to do between two or more computers so that more work gets done in the same amount of time and, in general, all users get served faster. Load balancing can be implemented with hardware, software, or a combination of both. Typically, load balancing is the main reason for computer server clustering. Load balancing is dividing the amount of work that a computer has to do between two or more computers so that more work gets done in the same amount of time and, in general, all users get served faster. Load balancing can be implemented with hardware, software, or a combination of both. Typically, load balancing is the main reason for computer server clustering.

51. 4/21/2012 Computer Architecture 51 General highlights from Top 500 The Earth Simulator build by NEC remains the unchallenged #1. 100 systems have peak performance above 1 TFlop/s up from 70 systems 6 month ago PC Cluster are now present at all levels of performance IBM is still leading the list with respect to the installed performance ahead of HP and NEC Hewlett-Packard stays slightly ahead of IBM with respect to the number of systems installed (HP 137 and IBM 131)

52. 4/21/2012 Computer Architecture 52 NEC Earth-Simulator/ 5120 from Japan

53. 4/21/2012 Computer Architecture 53 Basic Idea/Component Environment Research The Earth Simulator consists of 640 supercomputers that are connected by a high-speed network (data transfer speed; 12.3 GBytes). Each supercomputer (1 node) contains eight vector processors with a peak performance of 8GFlops and a high-speed memory of 16 GBytes. The total number of processors is 5120 (8 x 640), which translates to a total of approximately 40 TFlops peak performance, and a total main memory of 10 TeraBytes.

54. 4/21/2012 Computer Architecture 54 Hewlett-Packard SuperDome supercomputer

55. 4/21/2012 Computer Architecture 55 Terms need to know flops: Acronym for floating-point operations per second. Note: For example, 15 Mflops equals 15 million floating-point arithmetic operations per second. It is a unit of measurement of the performance of a computer LINPACK is a collection of Fortran subroutines that analyze and solve linear equations and linear least-squares problems. Rmax----431.70 (Maximal LINPACK performance achieved )Rpeak----672.00 (Theoretical)

56. 4/21/2012 Computer Architecture 56 Challenges Faster algorithms Good data locality Low communication requirement Efficient software High level problem solving environment Changes of architecture

57. 4/21/2012 Computer Architecture 57 Reference power comsuption of processor - http://www.macinfo.de/hardware/strom.html Under the hood - http://www.kids-online.net/learn/clicknov/details/cpu.html Difference Machine and Charles Babbage- http://www.cbi.umn.edu/exhibits/cb.html John Von Neumann - http://ei.cs.vt.edu/~history/VonNeumann.html I/O - http://sophia.dtp.fmph.uniba.sk/pchardware/bus.html cpu & memory- http://csep1.phy.ornl.gov/guidry/phys594/lectures/lectures.html memory - http://www.howstuffworks.com/computer-memory.htm general idea - http://www.ccs.uky.edu/~douglas/Classes/cs521-s02/index.html http://www.ccs.uky.edu/~douglas/Classes/cs521-s01/index.html http://www.ccs.uky.edu/~douglas/Classes/cs521-s00/index.html voltage - http://www.hardwarecentral.com/hardwarecentral/tutorials/19/1/http://www.hardwarecentral.com/hardwarecentral/tutorials/19/1/ csep-http://www.ccs.uky.edu/csep/csep.html top500 -http://www.top500.org cray co. -http://www.cray.com/company/h_systems.html definition of terms-htt[://www.whatis.com

58. 4/21/2012 Computer Architecture 58 Thank You!

Computer Architecture

Computer Architecture

Presentation Transcript

Computer Architecture

Computer Architecture

Computer Architecture

Computer Architecture

Computer Architecture

Computer Architecture

Computer Architecture

Computer Architecture

Computer Architecture

Computer Architecture

Computer Architecture

Computer Architecture

Computer Architecture

Computer Architecture

Computer Architecture

Computer Architecture

Computer Architecture

Computer Architecture

Computer Architecture

Computer Architecture