1 / 26

The SHARC

The SHARC . Super Harvard Architecture Computer. The SHARC. Developed by Analog Devices Optimized for demanding DSP and imaging applications. 32 Bit floating point, with 40 bit extended floating point capabilities. Large on-chip memory. Ideal for scalable multi-processing applications.

nat
Download Presentation

The SHARC

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The SHARC Super Harvard Architecture Computer Clare Smtih SHARC Presentation

  2. The SHARC • Developed by Analog Devices • Optimized for demanding DSP and imaging applications. • 32 Bit floating point, with 40 bit extended floating point capabilities. • Large on-chip memory. • Ideal for scalable multi-processing applications. Clare Smtih SHARC Presentation

  3. Harvard Architecture • Program memory can store data. • Able to simultaneously read or write data at one location and get instructions from another place in memory. • 2 buses • Data memory bus. • Program bus. • Either two separate memories or a single dual-port memory.

  4. Super Harvard Architecture • Many processor employ Harvard Architecture by having two separate memories or caches integrated into the processor chip • The SHARC is unique in that it’s internal memory is capable of holding a large program as well a large amount of data. This is what makes it SUPER!!! Clare Smtih SHARC Presentation

  5. DSP • Digital Signal Processor. • High speed, low overhead data movement and rapid computations required. • Usually has a small on-board ROM, RAM and single cycle multiply. • Designed to run single line, serial in, serial out, signal processing applications very fast. Clare Smtih SHARC Presentation

  6. DSP Computations • The inner product of two vectors is a common computation for determining energy or correlation. • The following C code is an example: for (n=0; n<length; n++) result+= x[n] * y[n]; • The process which has the lowest instruction time will have the best performance. Clare Smtih SHARC Presentation

  7. SHARC DSP • The SHARC incorporates features aimed at optimizing such loops. • High-Speed Floating Point Capability • Extended Floating Point • These features are DSP specific. • Meaning, when applied to a non-DSP application performance may not be as optimal. Clare Smtih SHARC Presentation

  8. Floating Point and Extended Floating Point • The SHARC supports floating, extended-floating and non-floating point. • No additional clock cycles for floating point computations. • Data automatically truncated and zero padded when moved between 32-bit memory and internal registers. • Not accurate enough for scientific algorithms. Excellent signal to noise ratio.

  9. SHARC’s Internal Memory • Makes SHARC unique. • Size • Allows many complex functions to be preformed on-chip. Eliminating the need to move data between internal and external memory. • Memory size is significantly larger then most other high speed computational devices. • Dual-block, Dual-port • Optimizes the Harvard Architecture by allowing the fetch of instructions while performing data memory accesses.

  10. Multiply and Accumulate Instructions on the SHARC • Like most DSPs the SHARC is able to compute a product and add the product to a running total in a single clock cycle. • The SHARC’s super instruction is that it can multiply and accumulate while adding, subtracting, or averaging data in two other registers. • These instructions give the SHARC its 120 megaflop rating.

  11. Zero Overhead Loopingon the SHARC • A single instruction outside the loop performs loop set-up. Informing the SHARC that there is a loop approaching. • The instruction also includes the iteration count and termination condition. • This causes the pipeline to remain full during loop execution and also allows the termination condition to be tested in parallel.

  12. DAGs on the SHARC • Data Address Generators are integer computation units that manage the indexing of registers. • Allows the SHARC to to fetch a value and update the index value. • If the updated value exceeds a limit, the DAB adjusts the index so that it wraps. • This occurs in the same clock cycle as the read or write.

  13. DAG Capabilities • Circular Buffering • Rather then actually moving data in and out of a vector, circular buffers are used. • Updating the index modulo, the oldest entry can be conveniently replaced by the newest entry. • Bit Reverse Addressing • The bit pattern of a vector index is reversed. • Done automatically by the SHARC. • Required for Fast Fourier Transform (FFT), which is often critical to DSP applications. Clare Smtih SHARC Presentation

  14. SHARC DSP • What Makes the SHARC unique? • It also has some features not related directly related to optimizing numeric computations. • Pipelining • Handling Branches • Why has this not emerged sooner? • Technology has only recently become available to make it economical to integrate general single computing devices. Clare Smtih SHARC Presentation

  15. SHARC’s Pipeline • 3 stages • Instruction Fetch • Decode • Execution • Takes three clock cycles for an instruction to propagate through the pipeline. • The processor execution speed is one instruction per clock cycle even though each instruction requires three clock cycles. Clare Smtih SHARC Presentation

  16. SHARC’s Handling BranchesDelayed Branching • When a branch instruction is encountered the two instructions which have been loaded and decoded are executed before the branch. • This keeps the pipeline full and avoids junking those two instructions and reloading the pipeline. • Beneficial in situations such as a few instruction loops. When the ratio of wasted clock cycles to instructions is significant.

  17. SHARC’s Handling BranchesNon-delayed Branching • Traditional branching. • If the pipeline cannot be reordered to use delayed branching, non-delayed branching is space saving. • Uses only one word of storage. • Although, it takes three cycles as the pipeline gets reloaded. Clare Smtih SHARC Presentation

  18. Multi-processing • SHARC is uniquely equipped for multi-processing. • Links to ports are very powerful multi-processing capabilities. • Two main program models depending on the application. • Adapts well to different multi-processing architectures. Clare Smtih SHARC Presentation

  19. Multi-processingSHARC Links • SHARC has 6 link ports that can transport data at rates up to 40Mbytes/sec. • Links designed for point-to-point connections. • Data can be transmitted in either direction but not both simultaneously. Clare Smtih SHARC Presentation

  20. Multi-processing Program ModelMIMD • Multiple instruction, multiple data. • Good for applications that require multiple instruction threads to execute concurrently. • Processors operate individually. • Each processor executes different code. • Typically used for image reconstruction and multi-channel DSP. Clare Smtih SHARC Presentation

  21. Multi-processing Program ModelSIMD • Single instruction, multiple data. • Works best when all processors execute identical instruction sequences. • Do not require overhead for inter-processor synchronization. • Typically used for synthetic aperture radar and automatic target recognition. Clare Smtih SHARC Presentation

  22. Multi-processing ArchitecturesCluster Design • Groups of up to 6 in a cluster • Most common for joining multiple SAHRC's • All processors, global I/O and global memory connected to a common “Cluster bus.” • Each SHARC can “drive” the bus. Clare Smtih SHARC Presentation

  23. Multi-processing ArchitecturesMesh Design • All SHARC’s joined by their link ports and are connected to a common bus. • In SIMD mode one single master SHARC drives the bus. • In MIMD mode mesh architecture cannot function if data is lager then on chip available memory. • Advantageous scalability over a wider range of applications.

  24. Summary of what makes the SHARC Super • It performs excellently for DSP applications. • Employs a Harvard Architecture with very large on chip memory. • Respectable Megaflop rating. • It’s multiprocessing capabilities. Clare Smtih SHARC Presentation

  25. How optimal is the SHARC for non-DSP Applications? • It is obviously geared for DSP applications. • While it may fare better then other processors it is still behind those which are designed specifically for non-DSP applications. Clare Smtih SHARC Presentation

  26. Sources • www.alacron.com/news/tp_mimd_simd.htm • www.analog.com • www.cs.seas.gwu.edu/~cs339/cs339-lecture2.pdf • www.ixthos.aa.psiweb.com/technical/notes_articles/articles Clare Smtih SHARC Presentation

More Related