490 likes | 1.09k Views
The SHARC . Super Harvard Architecture Computer. The SHARC. Developed by Analog Devices Optimized for demanding DSP and imaging applications. 32 Bit floating point, with 40 bit extended floating point capabilities. Large on-chip memory. Ideal for scalable multi-processing applications.
E N D
The SHARC Super Harvard Architecture Computer Clare Smtih SHARC Presentation
The SHARC • Developed by Analog Devices • Optimized for demanding DSP and imaging applications. • 32 Bit floating point, with 40 bit extended floating point capabilities. • Large on-chip memory. • Ideal for scalable multi-processing applications. Clare Smtih SHARC Presentation
Harvard Architecture • Program memory can store data. • Able to simultaneously read or write data at one location and get instructions from another place in memory. • 2 buses • Data memory bus. • Program bus. • Either two separate memories or a single dual-port memory.
Super Harvard Architecture • Many processor employ Harvard Architecture by having two separate memories or caches integrated into the processor chip • The SHARC is unique in that it’s internal memory is capable of holding a large program as well a large amount of data. This is what makes it SUPER!!! Clare Smtih SHARC Presentation
DSP • Digital Signal Processor. • High speed, low overhead data movement and rapid computations required. • Usually has a small on-board ROM, RAM and single cycle multiply. • Designed to run single line, serial in, serial out, signal processing applications very fast. Clare Smtih SHARC Presentation
DSP Computations • The inner product of two vectors is a common computation for determining energy or correlation. • The following C code is an example: for (n=0; n<length; n++) result+= x[n] * y[n]; • The process which has the lowest instruction time will have the best performance. Clare Smtih SHARC Presentation
SHARC DSP • The SHARC incorporates features aimed at optimizing such loops. • High-Speed Floating Point Capability • Extended Floating Point • These features are DSP specific. • Meaning, when applied to a non-DSP application performance may not be as optimal. Clare Smtih SHARC Presentation
Floating Point and Extended Floating Point • The SHARC supports floating, extended-floating and non-floating point. • No additional clock cycles for floating point computations. • Data automatically truncated and zero padded when moved between 32-bit memory and internal registers. • Not accurate enough for scientific algorithms. Excellent signal to noise ratio.
SHARC’s Internal Memory • Makes SHARC unique. • Size • Allows many complex functions to be preformed on-chip. Eliminating the need to move data between internal and external memory. • Memory size is significantly larger then most other high speed computational devices. • Dual-block, Dual-port • Optimizes the Harvard Architecture by allowing the fetch of instructions while performing data memory accesses.
Multiply and Accumulate Instructions on the SHARC • Like most DSPs the SHARC is able to compute a product and add the product to a running total in a single clock cycle. • The SHARC’s super instruction is that it can multiply and accumulate while adding, subtracting, or averaging data in two other registers. • These instructions give the SHARC its 120 megaflop rating.
Zero Overhead Loopingon the SHARC • A single instruction outside the loop performs loop set-up. Informing the SHARC that there is a loop approaching. • The instruction also includes the iteration count and termination condition. • This causes the pipeline to remain full during loop execution and also allows the termination condition to be tested in parallel.
DAGs on the SHARC • Data Address Generators are integer computation units that manage the indexing of registers. • Allows the SHARC to to fetch a value and update the index value. • If the updated value exceeds a limit, the DAB adjusts the index so that it wraps. • This occurs in the same clock cycle as the read or write.
DAG Capabilities • Circular Buffering • Rather then actually moving data in and out of a vector, circular buffers are used. • Updating the index modulo, the oldest entry can be conveniently replaced by the newest entry. • Bit Reverse Addressing • The bit pattern of a vector index is reversed. • Done automatically by the SHARC. • Required for Fast Fourier Transform (FFT), which is often critical to DSP applications. Clare Smtih SHARC Presentation
SHARC DSP • What Makes the SHARC unique? • It also has some features not related directly related to optimizing numeric computations. • Pipelining • Handling Branches • Why has this not emerged sooner? • Technology has only recently become available to make it economical to integrate general single computing devices. Clare Smtih SHARC Presentation
SHARC’s Pipeline • 3 stages • Instruction Fetch • Decode • Execution • Takes three clock cycles for an instruction to propagate through the pipeline. • The processor execution speed is one instruction per clock cycle even though each instruction requires three clock cycles. Clare Smtih SHARC Presentation
SHARC’s Handling BranchesDelayed Branching • When a branch instruction is encountered the two instructions which have been loaded and decoded are executed before the branch. • This keeps the pipeline full and avoids junking those two instructions and reloading the pipeline. • Beneficial in situations such as a few instruction loops. When the ratio of wasted clock cycles to instructions is significant.
SHARC’s Handling BranchesNon-delayed Branching • Traditional branching. • If the pipeline cannot be reordered to use delayed branching, non-delayed branching is space saving. • Uses only one word of storage. • Although, it takes three cycles as the pipeline gets reloaded. Clare Smtih SHARC Presentation
Multi-processing • SHARC is uniquely equipped for multi-processing. • Links to ports are very powerful multi-processing capabilities. • Two main program models depending on the application. • Adapts well to different multi-processing architectures. Clare Smtih SHARC Presentation
Multi-processingSHARC Links • SHARC has 6 link ports that can transport data at rates up to 40Mbytes/sec. • Links designed for point-to-point connections. • Data can be transmitted in either direction but not both simultaneously. Clare Smtih SHARC Presentation
Multi-processing Program ModelMIMD • Multiple instruction, multiple data. • Good for applications that require multiple instruction threads to execute concurrently. • Processors operate individually. • Each processor executes different code. • Typically used for image reconstruction and multi-channel DSP. Clare Smtih SHARC Presentation
Multi-processing Program ModelSIMD • Single instruction, multiple data. • Works best when all processors execute identical instruction sequences. • Do not require overhead for inter-processor synchronization. • Typically used for synthetic aperture radar and automatic target recognition. Clare Smtih SHARC Presentation
Multi-processing ArchitecturesCluster Design • Groups of up to 6 in a cluster • Most common for joining multiple SAHRC's • All processors, global I/O and global memory connected to a common “Cluster bus.” • Each SHARC can “drive” the bus. Clare Smtih SHARC Presentation
Multi-processing ArchitecturesMesh Design • All SHARC’s joined by their link ports and are connected to a common bus. • In SIMD mode one single master SHARC drives the bus. • In MIMD mode mesh architecture cannot function if data is lager then on chip available memory. • Advantageous scalability over a wider range of applications.
Summary of what makes the SHARC Super • It performs excellently for DSP applications. • Employs a Harvard Architecture with very large on chip memory. • Respectable Megaflop rating. • It’s multiprocessing capabilities. Clare Smtih SHARC Presentation
How optimal is the SHARC for non-DSP Applications? • It is obviously geared for DSP applications. • While it may fare better then other processors it is still behind those which are designed specifically for non-DSP applications. Clare Smtih SHARC Presentation
Sources • www.alacron.com/news/tp_mimd_simd.htm • www.analog.com • www.cs.seas.gwu.edu/~cs339/cs339-lecture2.pdf • www.ixthos.aa.psiweb.com/technical/notes_articles/articles Clare Smtih SHARC Presentation