570 likes | 805 Views
4/20/2012. Computer Architecture. 2. What is a digital computer ? . A digital computer is a machine composed of the following three basic components - Input/Output- Central Processing Unit (CPU) - Memory . 4/20/2012. Computer Architecture. 3. Early Computers. As early as the 1600s Calculating
E N D
1. 4/21/2012 Computer Architecture 1 Computer Architecture Davies Muche
&
Mike Li Luo
CS521 Spring 2003
2. 4/21/2012 Computer Architecture 2 What is a digital computer ? A digital computer is a machine composed of the following three basic components
- Input/Output
- Central Processing Unit (CPU)
- Memory
3. 4/21/2012 Computer Architecture 3 Early Computers As early as the 1600s Calculating machines which could do Arithmetic operations had been made, but, non had the three basic components of a digital computer
In 1823, Charles Babbage undertook the design of the Difference Engine
The machine was to solve 6th Degree polynomials to 20 digit accuracy
4. 4/21/2012 Computer Architecture 4 the concepts of mechanical control and mechanical calculation put together into a machine that has the basic parts of a digital computer
He was given 17,000 Pounds to construct the machine but, the project was abandoned in 1842 (uncompleted)
1856, Babbage conceived the idea of the Analytical Machine (After his death his son Henry tried to build it but never succeeded)
In 1854, George Scheutz, built a working Difference machine based on Babbage’s design. (This machine printed mathematical, astronomical and actuarial tables with unprecedented accuracy, and was used by the British and American governments)
5. 4/21/2012 Computer Architecture 5
6. 4/21/2012 Computer Architecture 6 However, in 1834, Charles Babbage, developed the hypothetical program to solve simultaneous equations on the Analytical Machine
7. 4/21/2012 Computer Architecture 7 The John von Neumann Architecture consists of five major components (1940s)
8. 4/21/2012 Computer Architecture 8 A refinement of the von Neumann model, the system bus model has a CPU (ALU and control), memory, and an input/output unit
9. 4/21/2012 Computer Architecture 9
10. 4/21/2012 Computer Architecture 10 The CPU CPU (central processing unit) is an older term for processor and microprocessor, the central unit in a computer containing the logic circuitry that performs the instructions of a computer's programs.
NOTABLE TYPES
- RISC: Reduced Instruction Set Computer
-Introduced in the mid 1980s
-Requires few transistors
-capable of executing only a very limited set of
instructions
- CISC: Complex Instruction Set Computer
-complex CPUs that had ever-larger sets of instructions
11. 4/21/2012 Computer Architecture 11 RISC or CISC “The great Controversy” RISC proponents argue that RISC machines are both cheaper and faster, and are therefore the machines of the future.
Skeptics note that by making the hardware simpler, RISC architectures put a greater burden on the software. They argue that this is not worth the trouble because conventional microprocessors are becoming increasingly fast and cheap anyway.
The TRUTH!
CISC and RISC implementations are becoming more and more alike. Many of today's RISC chips support as many instructions as yesterday's CISC chips. And today's CISC chips use many techniques formerly associated with RISC chips.
12. 4/21/2012 Computer Architecture 12
13. 4/21/2012 Computer Architecture 13 What you need to Know about a CPU Processing speed
- The clock Frequency is one measure of how fast a computer is ( however, the length of time to carry out an operation depends not only on how fast the processor cycles, but how many cycles are required to perform a given operation.
Voltage requirement
Transistors (electronic switches) in the CPU requires some voltage to trigger them.
- In the pre-486DX66 days, everything was 5 volts
- As chips got faster and power became a concern,
designers dropped the chip voltage down to 3.3 volts (external Voltage) and 2.9V or 2.5V core voltage
14. 4/21/2012 Computer Architecture 14 More on Voltage Requirements… Power consumption equates largely with heat generation, which is a primary enemy in achieving increased performance. Newer processors are larger and faster, and keeping them cool can be a major concern.
Reducing power usage is a primary objective for the designers of notebook computers, since they run on batteries with a limited life. (They also are more sensitive to heat problems since their components are crammed into such a small space).
Compensate for by using lower-power semiconductor processes, and shrinking the circuit size and die size. Newer processors reduce voltage levels even more by using what is called a dual voltage, or split rail design
15. 4/21/2012 Computer Architecture 15 More on Dual Voltage Design … A split rail processor uses two different voltages.
The external or I/O voltage is higher, typically 3.3V for compatibility with the other chips on the motherboard.
The internal or core voltage is lower: usually 2.5 to 2.9 volts. This design allows these lower-voltage CPUs to be used without requiring wholesale changes to motherboards, chipsets etc.
16. 4/21/2012 Computer Architecture 16 Power consumption verses speed of some processors
17. 4/21/2012 Computer Architecture 17 MEMORY Computers have hierarchies of memories that may be classified according to Function, Capacity and Response Times.
-Function
"Reads" transfer information from the memory; "Writes" transfer information to the memory:
-Random Access Memory (RAM) performs both reads and writes.
-Read-Only Memory (ROM) contains information stored at the
time of manufacture that can only be read.
-Programmable Read-Only Memory (PROM) is ROM that can be written once
at some point after manufacture.
-Capacity
bit = smallest unit of memory (value of 0 or 1);
byte = 8 bits;
In modern computers, the total memory may range from say 16 MB in a small personal computer to several GB (gigabytes) in large supercomputers.
18. 4/21/2012 Computer Architecture 18 More on memory … Memory Response
Memory response is characterized by two different measures:
-Access Time (also termed response time or latency) defines how quickly the memory can respond to a read or write request.
-Memory Cycle Time refers to the minimum period between two successive requests of the memory.
-Access times vary from about 80 ns [ns = nanosecond = 10^(-9) seconds] for chips in small personal computers to about 10 ns or less for the fastest chips in caches and buffers. For various reasons, the memory cycle time is more than the speed of the memory chips (i.e., the length of time between successive requests is more than the 80 ns speed of the chips in a small personal computer).
19. 4/21/2012 Computer Architecture 19
20. 4/21/2012 Computer Architecture 20 The I/O BUS A Computer transfers data from disk to CPU, from CPU to memory, or from memory to the display adapter etc.
To avoid having a separate circuits between every pair of devices, the BUS is used.
Definition:
The Bus is simply a common set of wires that connect all the computer devices and chips together
21. 4/21/2012 Computer Architecture 21 Different functions for Different wires of the bus Some of these wires are used to transmit data.
Some send housekeeping signals, like the clock pulse. Some transmit a number (the "address") that identifies a particular device or memory location
Use of the address
The computer chips and devices watch the address wires and respond when their identifying number (address) is transmitted before they can transfer data
Problem!
Starting with machines that used the 386 CPU, CPUs and memory ran faster than other I/O devices
Solution
- Separate the CPU and memory from all the I/O. Today, memory is only added by plugging it into special sockets on the main computer board.
22. 4/21/2012 Computer Architecture 22 Bus Speeds Multiple Buses with different speeds is an option or a single bus supporting different speeds is used
In a modern PC, there may be a half dozen different Bus areas.
There is certainly a "CPU area" that still contains the CPU, memory, and basic control logic.
There is a "High Speed I/O Device" area that is either a VESA Local Bus (VLB) or an PCI Bus
23. 4/21/2012 Computer Architecture 23 Some Bus Standards ISA (Industry Standard Architecture) bus
In 1987 IBM introduced a new Microchannel (MCA) bus
The other vendors developed an extension of the older ISA interface called EISA
VESA Local Bus (VLB), which became popular at the start of 1993
24. 4/21/2012 Computer Architecture 24 More Bus Standards … The PCI bus was developed by Intel
PCI is a 64 bit interface in a 32 bit package
The PCI bus runs at 33 MHz and can transfer 32 bits of data (four bytes) every clock tick.
That sounds like a 32-bit bus! However, a clock tick at 33 MHz is 30 nanoseconds, and memory only has a speed of 70 nanoseconds. When the CPU fetches data from RAM, it has to wait at least three clock ticks for the data. By transferring data every clock tick, the PCI bus can deliver the same throughput on a 32 bit interface that other parts of the machine deliver through a 64 bit path.
25. 4/21/2012 Computer Architecture 25 Things to know about I/O Bus Buses transfer information between parts of a computer. Smaller computers have a single bus; more advanced computers have complex interconnection strategies.
Things to know about the bus
Transaction = Unit of communication on bus.
Bus Master = The module controlling the bus at a particular time.
Arbitration Protocol = Set of signals exchanged to decide which of two competing modules will control a bus at a particular time.
Communication Protocol = Algorithm used to transfer data on the bus.
Asynchronous Protocol = Communication algorithm that can begin at any time; requires overhead to notify receivers that transfer is about to begin.
26. 4/21/2012 Computer Architecture 26 Things to know about the bus continued … Synchronous Protocol = Communication algorithm that can begin only at well-know times defined by a global clock.
Transfer Time = Time for data to be transferred over the bus in single transaction.
Bandwidth = Data transfer capacity of bus; usually expressed in bits per second (bps). Sometimes termed throughput.
Bandwidth and Transfer Time measure related things, but bandwidth takes into account required overheads and is usually a more useful measure of the speed of the bus.
27. 4/21/2012 Computer Architecture 27 Supercomputer Architecture Background
Architecture
Approaches
Trends
Challenges
28. 4/21/2012 Computer Architecture 28 What is parallel computing Use of multiple computers or processors working together to do a common task.
Each processor works on its section of the problem
Processors are allow to exchange information with other processors
29. 4/21/2012 Computer Architecture 29 Why parallel computing Limits of single computer
Available memory
Performance
Parallel computing allows
Solve problems that don’t fit on a single computer
Solve problems that can’t be solve in the reasonable time
30. 4/21/2012 Computer Architecture 30 First Supercomputer 1976, first supercomputer, the Cray-1
It had a speed of tens of megaflops (one megaflop equals a million floating-point operations per second) and a memory capacity of 4 megabytes.
Contribution from Los Alamos Lab, and Seymour Cray
Less than the average speed of PC today
31. 4/21/2012 Computer Architecture 31 Growing Speed The performance of the fastest computers has grown exponentially from 1945 to the present, averaging a factor of 10 every five years
Tens of floating-point operations per second, the parallel computers of the mid-1990s achieve tens of billions of operations per second
32. 4/21/2012 Computer Architecture 32 Pipeline
Pipeline: start performing an operation on one piece of data while finishing the same operation on another piece of data
An operation consists of multiple stages.
After a set of operands complete a particular stage, they move into the next stage.
Then, another set of operands can move into the stage that was just abandoned.
It is like doing three loads of laundry on one washer and dryer, while you are washing you can do drying at the same time….It is like doing three loads of laundry on one washer and dryer, while you are washing you can do drying at the same time….
33. 4/21/2012 Computer Architecture 33 SuperPipeline
Superpipeline: perform multiple pipelined operations at the same time
So, a superpipeline is a collection of multiple pipelines that can operate simultaneously.
In other words, several different operations can execute simultaneously, and each of these operations can be broken into stages, each of which is filled all the time.
So you can get multiple operations per CPU cycle.
For example, a IBM Power4 can have over 200 different operations “in flight” at the same time.
34. 4/21/2012 Computer Architecture 34 Sample of superpipeline design
35. 4/21/2012 Computer Architecture 35 Drawbacks for pipeline architecture---Pipeline Hazards structural hazards: attempt to use the same resource two different ways at the same time
e.g., multiple memory accesses, multiple register writes
solutions: multiple memories, stretch pipeline
control hazards: attempt to make a decision before condition is evaluated
e.g., any conditional branch
solutions: prediction, delayed branch
data hazards: attempt to use item before it is ready
solutions: forwarding/bypassing
36. 4/21/2012 Computer Architecture 36 Memory shared memory system, there is one large virtual memory, and all processors have equal access to data and instructions in this memory.
37. 4/21/2012 Computer Architecture 37 Memory cont… distributed memory, in which each processor has a local memory that is not accessible from any other processor.
38. 4/21/2012 Computer Architecture 38 Difference of two kind f memories Software issue not hardware
The difference determines how different parts of a parallel program will communicate.
shared memory with semaphores, etc. or distributed memory with message passing.
All problems run efficiently on a distributed memory BUT software is easier to develop
39. 4/21/2012 Computer Architecture 39 Cache Coherency
40. 4/21/2012 Computer Architecture 40 Styles of parallel computing (Hardware Architecture) SISD-single instruction stream, single data stream
SIMD-single instruction stream, multiple data streams
MISD-multiple instruction streams, single data stream
MIMD-multiple instruction streams, multiple data streams
41. 4/21/2012 Computer Architecture 41 SISD Single Instruction, Single Data
Traditional single processor machineTraditional single processor machine
42. 4/21/2012 Computer Architecture 42 SIMD Single Instruction, Multiple Data
Single operation acts on a number of variables Single operation acts on a number of variables
43. 4/21/2012 Computer Architecture 43 MISD Multiple Instruction, Single Data shared memory machines, SMP. Multiple processors accessing same memory shapeshared memory machines, SMP. Multiple processors accessing same memory shape
44. 4/21/2012 Computer Architecture 44 MIMD Multiple Instruction, Multiple Data(simplest: program controlled message passing) distributed memory machines., Beowulf clusters, Network of Workstations (NoWs). Multiple programs (often same program, but can be at different stages of execution) running on different machines, with own memory space. Linked by high speed (e.g. gigaswitch), or medium(e.g. 100Mb switch) network. Typically requires explicit message passing calls. distributed memory machines., Beowulf clusters, Network of Workstations (NoWs). Multiple programs (often same program, but can be at different stages of execution) running on different machines, with own memory space. Linked by high speed (e.g. gigaswitch), or medium(e.g. 100Mb switch) network. Typically requires explicit message passing calls.
45. 4/21/2012 Computer Architecture 45 Two parallel processing approaches SMP: symmetric multiprocessing
SMP is the processing of programs by multiple processors that share a common operating system and memory
MPP: massively parallel processing
MPP is the coordinated processing of a program by multiple processors that work on different parts of the program, with each processor using its own operating system and memory
46. 4/21/2012 Computer Architecture 46 Current Trend OpenMP:OpenMP is an open standard for providing parallelization mechanisms on shared-memory multiprocessors.
C/C++ and FORTRAN, several of the most commonly used languages for writing parallel programs.
based on a thread paradigm
47. 4/21/2012 Computer Architecture 47 OpenMP execution model
48. 4/21/2012 Computer Architecture 48 New Trend Clustering
The Widest Definition:
Any number of computers communicating at any distance
The Common Definition:
A relatively small number of computers (<1000) communicating at a relatively small distance (within the same room) and used asa single, shared computing resource
49. 4/21/2012 Computer Architecture 49 Comparison Programming
A Program written for Cluster Parallelism can run on an SMP right away
A Program written for an SMP can NOT run on a Cluster right away
Scalability
Clusters are Scalable
SMPs are NOT Scalable above a Small Number of Processors
50. 4/21/2012 Computer Architecture 50 Comparison cont.. One big advantage of SMPs is the Single System Image
Easier Administration and Support
But, Single Point of Failure
Cluster computing can be used for load balancing as well as for high availability Load balancing is dividing the amount of work that a computer has to do between two or more computers so that more work gets done in the same amount of time and, in general, all users get served faster. Load balancing can be implemented with hardware, software, or a combination of both. Typically, load balancing is the main reason for computer server clustering. Load balancing is dividing the amount of work that a computer has to do between two or more computers so that more work gets done in the same amount of time and, in general, all users get served faster. Load balancing can be implemented with hardware, software, or a combination of both. Typically, load balancing is the main reason for computer server clustering.
51. 4/21/2012 Computer Architecture 51 General highlights from Top 500 The Earth Simulator build by NEC remains the unchallenged #1.
100 systems have peak performance above 1 TFlop/s up from 70 systems 6 month ago
PC Cluster are now present at all levels of performance
IBM is still leading the list with respect to the installed performance ahead of HP and NEC
Hewlett-Packard stays slightly ahead of IBM with respect to the number of systems installed (HP 137 and IBM 131)
52. 4/21/2012 Computer Architecture 52 NEC Earth-Simulator/ 5120 from Japan
53. 4/21/2012 Computer Architecture 53 Basic Idea/Component Environment Research
The Earth Simulator consists of 640 supercomputers that are connected by a high-speed network (data transfer speed; 12.3 GBytes). Each supercomputer (1 node) contains eight vector processors with a peak performance of 8GFlops and a high-speed memory of 16 GBytes. The total number of processors is 5120 (8 x 640), which translates to a total of approximately 40 TFlops peak performance, and a total main memory of 10 TeraBytes.
54. 4/21/2012 Computer Architecture 54 Hewlett-Packard SuperDome supercomputer
55. 4/21/2012 Computer Architecture 55 Terms need to know flops: Acronym for floating-point operations per second. Note: For example, 15 Mflops equals 15 million floating-point arithmetic operations per second. It is a unit of measurement of the performance of a computer
LINPACK is a collection of Fortran subroutines that analyze and solve linear equations and linear least-squares problems.
Rmax----431.70 (Maximal LINPACK performance achieved )Rpeak----672.00 (Theoretical)
56. 4/21/2012 Computer Architecture 56 Challenges Faster algorithms
Good data locality
Low communication requirement
Efficient software
High level problem solving environment
Changes of architecture
57. 4/21/2012 Computer Architecture 57 Reference power comsuption of processor - http://www.macinfo.de/hardware/strom.html
Under the hood - http://www.kids-online.net/learn/clicknov/details/cpu.html
Difference Machine and Charles Babbage- http://www.cbi.umn.edu/exhibits/cb.html
John Von Neumann - http://ei.cs.vt.edu/~history/VonNeumann.html
I/O - http://sophia.dtp.fmph.uniba.sk/pchardware/bus.html
cpu & memory- http://csep1.phy.ornl.gov/guidry/phys594/lectures/lectures.html
memory - http://www.howstuffworks.com/computer-memory.htm
general idea -
http://www.ccs.uky.edu/~douglas/Classes/cs521-s02/index.html http://www.ccs.uky.edu/~douglas/Classes/cs521-s01/index.html http://www.ccs.uky.edu/~douglas/Classes/cs521-s00/index.html
voltage - http://www.hardwarecentral.com/hardwarecentral/tutorials/19/1/http://www.hardwarecentral.com/hardwarecentral/tutorials/19/1/
csep-http://www.ccs.uky.edu/csep/csep.html
top500 -http://www.top500.org
cray co. -http://www.cray.com/company/h_systems.html
definition of terms-htt[://www.whatis.com
58. 4/21/2012 Computer Architecture 58
Thank You!