140 likes | 293 Views
Shobana Padmanabhan, Dan L egorreta, Moshe Looks CSE 560 Oct 2005. Application Performance through Hardware Acceleration. Application Performance. Architecture. Compiler. Algorithm. Liquid architecture platform. Workstation. program. FPGA. gcc. SRAM / SDRAM. Memory Controller.
E N D
Shobana Padmanabhan, Dan Legorreta, Moshe Looks CSE 560 Oct 2005 Application Performance throughHardware Acceleration
Application Performance Architecture Compiler Algorithm
Liquid architecture platform Workstation program FPGA gcc SRAM / SDRAM Memory Controller 001010 110110 001110 Core Cache Controller Address/ Data bus AHB I-CACHE D-CACHE Command Controller Control S/W Interface Clustering application FPX LEON 001010 110110 001110 • LEON - SPARC8 compatible & • Open soft core
Application runtime Workstation FPGA SRAM / SDRAM Memory Controller Results & Timing 001010 110110 001110 Core Cache Controller Address/ Data bus AHB I-CACHE D-CACHE Command Controller Control S/W Interface Slow! Where is time spent? FPX LEON
Pipeline Stalls Branch Predict Function Time / Cycles Cache Hits / Misses Read Write .text main findMatch Can profile all aspects of micro-architecture addQuery computeKey computeBase coreLoop fillQuery Rnd
Cycle-accurate profiling for free Workstation FPGA SRAM / SDRAM Memory Controller 001010 110110 001110 Core Cache Controller Statistics Module Address/ Data bus AHB I-CACHE D-CACHE Command Controller Control S/W Interface Request Timings FPX findMatch 500ms coreLoop 300ms LEON
Improve application performance • By reconfiguring the processor • By creating special hardware instructions
Special hardware instruction Workstation program FPX FPGA gcc LEON SRAM / SDRAM Memory Controller 001010 110110 001110 Core Cache Controller + dot product Address/ Data bus AHB I-CACHE D-CACHE Command Controller Control S/W Interface
Special hardware instruction Workstation program FPX FPGA gcc LEON SRAM / SDRAM Memory Controller 001010 110110 001110 Core Cache Controller + dot product Address/ Data bus AHB I-CACHE D-CACHE Command Controller Control S/W Interface 001010 110110 001110
Related work • Gaisler Research. http://www.gaisler.com • Lesley Shannon and Paul Chow. Using reconfigurability to achieve real-time profiling for hardware/software codesign. In Proc. ACM Int’l Symp. on Field Programmable Gate Arrays, pages 190–199, 2004. • T. Vinod Kumar Gupta, Roberto E. Ko, and Rajeev Barua. Compiler-directed customization of ASIP cores. In Proc. of the 10th Int’l Symp. on Hardware/Software Codesign, pages 97–102, May 2002. • Shobana Padmanabhan, Phillip Jones, et. al. Extracting and Improving Microarchitecture Performance on Reconfigurable Architectures. In Workshop on Compilers and Tools for Constrained Embedded Systems workshop at Inter. Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES), Washington DC, Sep 2004. • Stretch, Inc. http://www.stretchinc.com. • Tensilica, Inc. http://www.tensilica.com. • John W. Lockwood. The Fieldprogrammable Port Extender (FPX). http://www.arl.wustl.edu/arl/projects/fpx/, December 2003. • Paolo Ienne Kubilay Atasu, Laura Pozzi. Automatic application-specific instruction-set extensions under microarchitectural constraints. Int’l Symp. on Field Programmable Gate Arrays, pages 190–199, 2004. • Michael Gschwind. Instruction set selection for ASIP design. In Proc. of the 7th Int’l Symp. on Hardware/Software Codesign, pages 7–11, May 1999. • N. Clark, W. Tang, S. Mahlke. Automatically Generating Custom Instruction Set Extensions. Workshop on Application Specific Processors. Nov 2002, Istanbul, Turkey. • A. K. Verma, K. Atasu, M. Vuleti´c, L. Pozzi, P. Ienne. Automatic Application-Specific Instruction-Set Extensions under Microarchitectural Constraints. Nov 2002, Istanbul, Turkey. • Kenshu Seto, Kojima Yoshihisa, Masahiro Fujita. Compiler Techniques for Field Modifiable Architectures. In Workshop on Compilers and Tools for Constrained Embedded Systems workshop at Inter. Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES), Washington DC, Sep 2004.
Related work – cntd. • Hierarchical Clustering in Hardware - Papers1. Transformation Algorithms for Data StreamsJohn W. Lockwood, Stephen G. Eick, Doyle J. Weishar, Ron Loui, James Moscola, Chip Kastner, Andrew Levine, Mike Attighttp://www.arl.wustl.edu/~lockwood/publications/WashU-AERO_2005-AFE_Summer_Experiment_Paper.pdf2. • Implementation of a Content-Scanning Module for an Internet FirewallJames Moscola, John Lockwood, Ronald P. Loui, Michael Pachoshttp://www.arl.wustl.edu/projects/fpx/references/FCCM03/wu-content_scanning_firewall-FCCM_03-paper.pdf3. • FPsed: A Streaming Content Search-and-Replace Module for an Internet FirewallJames Moscola, Michael Pachos, John Lockwood, Ronald P. Louihttp://www.arl.wustl.edu/~lockwood/publications/hoti11_fpsed.pdf4. • Methods and Architectures for Realizing Fast Phylogenetic ComputationEngines Using VLSI Array Based LogicJames P. Davis, Sreesa Akella, Peter Waddellhttp://www.cse.sc.edu/~jimdavis/Research/Papers-PDF/Bioinformatics02-Davis-Akella-Waddell%5B1%5D.pdf5. • FPGA Implementation of Hierarchical Clustering AlgorithmsNiamat, M.Y., Bitter, D., Jamali, M.M.http://ieeexplore.ieee.org/iel4/5627/15118/00694410.pdf?arnumber=6944106. • Parallel Algorithms for Hierarchical ClusteringClark F. Olsonhttp://citeseer.ist.psu.edu/olson95parallel.html7. • Digital VLSI for Neural NetworksDan Hammerstromhttp://www.cecs.pdx.edu/~strom/papers/hammerstrom_draft2.pdf8. • Simulation of paleocortex performs hierarchical clusteringJ Ambros-Ingerson, R Granger, G Lynchhttp://www.jstor.org/view/00368075/di002048/00p0487f/0#&origin=sfx%3Asfx9. • Algorithmic Transformations in the Implementation ofK-means Clustering on Reconfigurable HardwareMike Estlick, Miriam Leeser, James Theiler, John J. Szymanskihttp://delivery.acm.org/10.1145/370000/360311/p103-estlick.pdf?key1=360311&key2=4848397211&coll=GUIDE&dl=ACM&CFID=54014978&CFTOKEN=8441184810. • Design Issues for Hardware Implementation of an Algorithm for Segmenting Hyperspectral Imagery James Theiler, Miriam Leeser, Michael Estlick, and John J. Szymanskihttp://mrfrench.lanl.gov/~jt/Papers/kmeans-spie-00.ps11. • FPGA Implementation of a Network of Neuronlike Adaptive Elements Andres Perez-Uribe and Eduardo Sanchezhttp://lslwww.epfl.ch/~aperez/ps/PerezSanchez_icann97.ps.gz12. • A Phylogenetic, Ontogenetic, and Epigenetic View of Bio-Inspired Hardware SystemsMoshe Sipper, Eduardo Sanchez, Daniel Mange,Marco Tomassini, Andres Perez-Uribe, and Andre Staufferhttp://www.cs.virginia.edu/bio/Sipper_POEmodel_97.pdf
Reconfigurable architecture • Generic processor - cheap but application-agnostic; compilers exist; compiler optimization is the key • Reconfigurable logic - subject of our study;architecture and compiler research are the key • Customized logic - ideal for an application but expensive; logic/architecture research is key Pentium FPGA Custom