180 likes | 629 Views
Tomasulo’s Algorithm and IBM 360. Srivathsan Soundararajan. What we have seen till now!!!. Single-cycle datapath. Multi-cycle datapath. N-stage Pipelined datapath. Tomasulo's algorithm.
E N D
Tomasulo’s Algorithm and IBM 360 Srivathsan Soundararajan
What we have seen till now!!! • Single-cycle datapath. • Multi-cycle datapath. • N-stage Pipelined datapath.
Tomasulo's algorithm • A hardware algorithm for controlling the execution of multiple functional units with varying latencies in a pipelined CPU micro-architecture. • A general mechanism for register forwarding and hazard detection. • The key idea is to virtually execute each instruction in a single cycle. • Out-of-order execution of instructions.
So what is so special??? • Let instructions behind stall proceed. • Decode instructions and check for structural hazard. • Wait until no data hazard and then read operands.
Three Stages • Issue – if reservation station free (i.e. no structural hazard), control issues instruction and sends operands (renames registers). • Execution – if both operands ready, then execute. If not, watch common data bus for result. • Write result – if CDB available, write on common data bus to all awaiting units; mark reservation status available
Virtual result – a promissory note • All registers are modified so that they can either hold a true result or a virtual result. • When an instruction is issued, a virtual result is placed in the instruction's destination register. • A functional unit is assigned to compute the real result. • The virtual result is replaced by the real result when the functional unit has completed its computation.
The concept • Each instruction, as it arrives, fetches its operands from a special register file. • Each register in this file holds either an actual value, or a “tag” indicating the reservation station that will produce the register value when it completes. • The instruction and its operands (either values or tags) are stored in a reservation station (RS). • The RS watches the results returning from the execution pipelines, and when a result's tag matches one of its operands, it records the value in place of the tag
A good example • http://www.cs.umd.edu/users/saltz/cmsc411-s97/tomasulo.htm
Why Tomasulo’s Algorithm • Hazard detection. • Dynamic Scheduling (i.e. Hardware reorganizes instructions)
IBM 360 • The IBM 360 introduced many new concepts, including dynamic detection of memory hazards, generalized forwarding, and reservation stations. • The approach is normally named Tomasulo’s algorithm
Installation of the IBM 360/91 in the Columbia Computer Center machine room in February or March 1969 Photo: AIS archive.
IBM 360 • Was introduced by the team led by Michael Flynn in 1966. • The internal organization of the 360/91 shares many features with the Pentium III and Pentium 4, as well as several other microprocessors. • One major difference was that there was no branch prediction in the 360/91 and hence no speculation. • Another major difference was that there was no commit unit, so once the instructions finished execution, they updated the registers. Out-of-order instruction commit led to imprecise interrupts, which proved to be unpopular and led to the commit units in dynamically scheduled pipelined processors since that time.
IBM 360 • Although the 360/91 was not a success, the key ideas were resurrected later and exist in some form in the majority of microprocessors like Pentium II, Power PC 604 etc... • It ran under Operating System/360 -- a powerful programming package of approximately 1.5 million instructions that enabled the system to operate with virtually no manual intervention.
IBM 360 • Within the central processing unit (CPU), there were five highly autonomous execution units which allowed the machine to overlap operations and process many instructions simultaneously. • The five units were processor storage, storage bus control, instruction processor, fixed-point processor and floating-point processor. Not only could these units operate concurrently, they could also perform several functions at the same time.
Some uses • The IBM-360 family of computers ranged from the model 20 minicomputer (which typically had 24 KB of memory) to the model 91 supercomputer which was built for the North American missile defense system.
References • http://www.d.umn.edu/~gshute/cs2521/arch/tomasulo.html • http://euler.ecs.umass.edu/arch/parts/Part6-tomasulo.pdf • http://www.cs.indiana.edu/classes/p415-sjoh/readings/smv/CadenceSMV-docs/smv/tutorial/node36.html#tomasulo1 • http://www.beagle-ears.com/lars/engineer/comphist/ibm360.htm • http://www.eecs.ucf.edu/~lboloni/Teaching/EEL5708_2006/slides/Tomasulo.ppt#268,9,Three Stages of Tomasulo Algorithm • http://www.cs.unc.edu/~montek/teaching/fall-03/lectures/lecture-11.ppt#723,4,Tomasulo: Organization • http://www.ece.cmu.edu/~jhoe/distribution/2005/741/proj3.pdf • http://www.columbia.edu/acis/history/36091.html