680 likes | 874 Views
Introduction to VLSI Algorithmic Design Automation Lab Research. Jun-Dong Cho VLSI Algorithmic Design Automation Lab. http://vada.skku.ac.kr School of Information and Telecommunication Sungkyunkwan Univ. Lab Introduction.
E N D
Introduction to VLSI Algorithmic Design Automation Lab Research Jun-Dong Cho VLSI Algorithmic Design Automation Lab. http://vada.skku.ac.kr School of Information and Telecommunication Sungkyunkwan Univ.
Lab Introduction • VLSI Algorithmic Design Automation: Lab (vada.skku.ac.kr) directed by Prof. Jun-Dong Cho studies SoC design problems and devoted to VLSI/SoC design automation, communication SoC. The lab consists of 5 Ph.D. students and 7 master students. Lab: SPW, Matlab Signal Processing Workshop, Code Composer, Cadence, Xilinx and high-performance PC’s and W/S’s.
Post Pc = Mobile computing + Intelligent environment • 109 times bandwidth and 106 times power consumption • 3GOPS to search a song in 0.5sec by humming from a D/B (containing 2000 songs) and 3D TV also requires several GOPS. • By National Technology Roadmap for Semiconductors, in 2010, 4 billion transistors with 50nm is integrated into one chip and its clock speed would be 10GHz • New design methodology is required to handle wiring delay and intrinsic electrical noise. • Ultra low energy (10-100 Mops/mW), Ultra low cost • S/W and H/W co-design, S/W-driven Design Reuse (e.g., software-Defined Radio)
~2003 VADA Research Themes • Low Power Reconfigurable MODEM (CDMA, WCDMA, OFDM) Architecture Design • Lower Power VLSI CAD: H/W & S/W co-design, Architecture-level/Logic-level, Optimizer, Placement/Routing Layout Optimizer
2004 VADA Research Theme • Lower Power Multiprocessor System on a Chip
VADA Class Lectures • 2004: SoC Architecture • 2003: Software Defined Radio, Embedded System Design • ~2002: VLSI Design for Digital Signal Processing, Introduction to Digital Communication, Introduction to Computer Aided Design of Integrated Circuits, Computer Architecture - Microsystems for Multimedia Applications • 1999:Low Power VLSI Design Optimization
Biography of Prof. Jun-Dong Cho 1983-1987 Samsung Electronics CAD 1989 Polytechnic Univ. Computer Science MS, 1993 Northwestern Univ. EECS Ph.D. 1993.6: The 30th Design Automation Conference (Dallas, TX) Best paper, 1996.6: IEEE Senior Member, 2000.8-2001.8: IBM T.J. Watson 연구소(Yorktown Height, NY), Design Automation Team Visiting Scientist (2001. 5: IBM Invention Achievement Award). 2000.10: Sungkyunkwan Univ. Best Professor Award 1990: reviewer: IEEE Trans. on VLSI Systems, IEEE Trans. on Circuit & Systems, IEEE Trans. on CAD of Integrated Circuit In Program Committee of ICCAD, ISQED, SLIP, ASPDAC, ICVC, ASPASIC. Books: High-Performance Physical Design for MCM and Packages, World Scientific Co., Oct. 1996, Wiley Encyclopedia of Electrical and Electronics Eng., VLSI Circuit Layout, John Wiley and Sons, Inc. Co-authored with M. Sarrafzadeh, April, 1999, Chapter "Steiner Tree Problems in VLSI Layout Designs" in "Steiner Trees in Industries" Kluwer Academic Publishers. May 2001, "Lower Power Digital Core Design for Multimedia and Telecommuniations" to be published through IDEC, 2002
OFDM 방식의 DVB-T 수신 시스템의 Software 구현 Mode Creation Process DVB-T Model Operation cycle Extraction TI CCS DVB-T Modeling C simulation SPW Signal Master Real DSP Model Real-DSP Performance Real-DSP Co-Sim Board
SignalMaster™ Emulation Platform: virtexII XCV6000 FPGA + TMS320C6701 VLIW DSP
COFDM DVB-T receiver Hardware/Software 분할 • 요구되는 연산량 및 실시간 동작 가능성 • 각 기능 모듈의 동기/동작 schedule • Multi processor의 경우 FFT Delay & Phase Rotator Equalizer FEC I/Q gen. DeMoD FFT GI, Mode Detect GI Remove Coarse STR Timing Proc. Fine STR MRC TPS Carrier Recovery NCO C Software code / hardware hardware Hardware 혹은 Software
Multiprocessor SoC Platform Architecture DVB-T Baseband Receiver => HW/SW partitioning HW/SW Co-design based on Multiprocessor SoC Platform for DVB-T Baseband Receiver
Teak DSP 플랫폼 구현 1. TEAK DSP 플렛폼 구현 • TEAK용 DMA 구현 • 연산 블록 구현 • XY 메모리 인터페이스 구현 • ARM 플랫폼과 연동을 위한 BIU(BUS INTERFACE UNIT)구현 2. DVB-T 수신기 HW/SW 분할중
Communication Interface구현 ▶ CI (Communication interface) Multiprocessor Platform 구조에서 IP`s 및 Shared memory 접근에 효율성을 증대하기 위해 Crossbar Switch 구조로 설계 ▶CI Controller 사용자에 의해 정의된 우선순위에 따라 slave 점유권을 Master에게 재분배하는 Arbitration기능, CI Cell 제어 기능. ▶ CI Cell 마스터(Teak, ARM)의 전송 요구를 controller 정보를 통해 마스터와 슬레이브 (Shared memory, IP`s)간의 연결.
Low-Power MPEG4 Codec Design • Low-Power Architecture For MPEG4 SOC • Reduction of loop memory Size (Fig.1) • Array Address Translation for low row activation (Fig. 2) • Memory Mapping for low data bus Transition <Fig. 1> <Fig. 2>
3D Image Sensing Platform 구현 • 실시간 동작을 위한 영상 처리 processing의 H/W설계 • 영상 처리 알고리즘 및 SDRAM controller의 HDL coding을 통한 FPGA 구현
SystemC를 사용한 제한수신시스템 POD 암호모듈 Coware의 ConvergenSC를 사용하여 ARM926EJS Core와 AMBA AHB 를 기반으로 한 Virtual Platform 설계를 담당하여 SystemC를 이용하여 Transaction Level Modeling 방법으로 연구 중
센서 네트워크 기반 모빌 홈케어 시스템 • 혈당기에서 측정된 혈당 데이터를 무선랜을 통해 전송할수 있도록 Wireless Interface Module • Intel Xscale PXA255 • 16MB Flash , 32MB SDRAM • Embedded Linux (2.4.18) 을 OS 로 사용 • 10Mbps wired ethernet, 11Mbps WLAN
Other VADA Researches • 2000.1 - 2000.12 : Low Power and High Performance Reconfigurable Equalizer for Cable MODEM, Samsung Electronics • 2000.5 - 2000.11 : Fast and Low Power Search Engine for Speech Recognition, Samsung Electronics • 2000.1 - 2000.12 : Reticle Frame Key Layout Placer for IC Reticles, Samsung Electronics • 1999.2 - 1999.11 : Lower Power Decoder For Convolutional Encoder, Samsung Advance Institute of Technology
암호 프로세서 개발 • 서버/클라이언트 시스템, 보안 토큰 및 스마트 카드에서 각종 보안 프로토콜(SSL, SET, IPSEC 등)을 처리하는데 필요한 암호 알고리즘을 고속으로 처리할 수 있음. • Features • - 비밀키 알고리즘 지원 : DES, 3DES, AES, SEED • - 해쉬 알고리즘 : MD-5, SHA-1, HAS-160 • - 메시지 인증 알고리즘 : HMAC-MD5, HMAC-SHA1, HMAC-HAS160 • - 공개키 알고리즘 : RSA1024, DSA, DH, ECC160 • - Modular arithmetic : Addition, Multiplication, Exponentiation, • - True Hardware Random Number Generator • - 16Kbyte Internal SRAM • - PCI 2.1 Master/Slave 모드 지원 • - MPC860 Interface • - Window 2000 서버용 Device Driver
Cable Modem Equalizer 케이블 모뎀용 하향채널 수신부
Low Power Multimedia Design • Low Power Motion Estimator • MPEG-2 real time Motion Estimator • 2-dimension systolic array dual PE (Process Element ) • Motion Estimation block: Memory access Reduction 70% • Fast and Low Power Viterbi Search Engine Using Inverse Hidden Markov Model
Interconnect-Centric Approach to System on a Chip (iSoC) for Low-Power Signal Processing 성균관대 조준동
재구성 플랫폼 기반 설계 방법 • Real-time reconfiguration architecture with minimum configuration time • Design space exploration • Dynamic Memory and Power management On a Chip (MPoC)
66% chips are not OK on first silicon (2004) Mid-90s – 6 months late = > 31% earnings loss Today 3 month late = $500M loss
Motivation • Wireless processing system은 높은 throughput과 함께 많은 계산을 필요로 하지만 엄격한 power 제약이 있음 • 재구성 SoC 구현은 parallelism 에 의해 성능향상을 시도하고, IP reuse를 사용 • Hot spot bottleneck에 의한 성능 예측을 통한 Algorithm partitioning
발전 방향 • 멀티미디어 응용 제품의 확대와 이에 필요한 대용량의 burst 데이터 전송요구를 만족하기 위한통신 대역폭을 확장 • Dual-Core Architecture (ARM+DSP) -> Multiprocessor SoC
최근 연구동향 • Intel’s Reconfigurable Radio Architecture. (mesh + nearest neighbor) • Reconfigurable Baseband Processing, Picochip • Portable Components using Containers for Heterogeneous Platforms, Mercury Computer Systems, Inc. • A configurable Platform, Altera, Excalibur, Xilinx Virtex FPGA • Adaptive Computing Machine, Quicksilver Tech. • Mercury, Sky, Galileo, Tundra (crossbars, bridges) • Virginia Tech’s reconfigurable hardware
Full Application Platform • users design full applications on top of hardware and software architectures • Nexperia • Texas Instrument's OMAP multimedia platform • Infineon's M-Gold 3G wireless platform, • Parthus' Bluetooth platforms • ARM's PrimeXsys wireless platform
OMAPTM(open multimedia application platform) • OMAP architecture는 platform의 전체 clocking과 idle mode의 전체 control을 할 수 있는 SW/OS가 있다. • Dual core architecture는 task에 대해 가정 적당한 process에게 task를 할당하는 것이 가능
Processor-centric platform • focus on access to a configurable processor but doesn't model complete applications • Program-in Chip-out (PICO), HP Lab. • UC. Berkeley, GARP • Improv Systems • ARC • Tensilica • Triscend
Fully programmable platform • consisting of FPGA logic and a processor core • System on a programmable chip(SOPC) • Altera's Excalibur, Xilinx' Virtex-II Pro and Quicklogic's QuickMIPS • Xilinx-IBM XBlue architecture
Communication- centric platform • interconnect architecture but doesn't typically provide a processor or a full application • Sonics' SiliconBackplane • PalmChip's CoreFrame architectures.
IBM’s Coreconnect 초기의 32 비트에서 시작하여 128비트까지 대역폭을 확장
SMART (Sonics Methodology and Architecture for Rapid Time-to-Market) • plug-and-play on-chip communications network • Packet-based • 50 employees in a year • IP 및 설계환경 제공, SoC 설계 지원 • Cadence와 연합 • SiliconBackplne III는 통신+미디어
Nexperia Digital Video Platform • Designing the initial platform, along with the pnx8500, wasn't quick and easy. • It involved about 300 hardware, software and systems people working between 1999 and 2001, of which 60 were involved with hardware.
Scheduled Communication • A tiled architecture • 각 tile은 computational core 이며 각 interface가 네트웍을 구성 • Core interface는 하나 이상의 tile 에서 발생하는heterogeneous processing의 사용을 제공함 • The system connect using statically scheduled mesh of interconnect • Data 는 이웃하는tile 과 communication pipeline 에 의해 이동하므로 fast clock rate 와 interconnection resource의 시 분할이 가능 • Core 와 runtime interconnect 의 재설정 능력에 의해 dynamic power management 를 가능케 한다.
Communication Interface • Stream data that passes through a communication • interface is scheduled for a specific communication • - clock cycle based on data link availability. • the result of scheduling for each interface is a set of • instructions for its associated interconnect memory.
Scheduler • The scheduler manages 5lists of threads. • Symmetric Multi-Processor(SMP) : Scheduler may be shared by all processors. • Distributed : Scheduler exist every processors. • The access to the scheduler must be performed in critical section, and under the protection of a lock. • Other implemented objects • Spin lock : the low level test and set access • Mutex : sequentialize access to shared data • Semaphore : sem_post is the only function that can be called in interruption handlers.
Review several types of scheduler • Symmetric Multiprocessor (SMP) • Unique scheduler shared by all processors and protected • The threads can run on any processor, and migrate • Centralized Non SMP (NON_SMP_CS) • Unique scheduler shared by all processors and protected • Every thread is assigned to a given processor and can run only on it • Distributed Non SMP (NON_SMP_DS) • Many schedulers as processors, and as many locks as schedulers • Every thread is assigned to a given processor and can run only on it
Implementation ◈ Booting sequence • The scheduler_created variable must be declared with the volatile type qualifier to ensure that compiler will not optimize this seemingly infinite loop.
Experimental setup ◈ Motion JPEG application Execution times of the MJPEG application Cycles spent in the CPU idle Loop
Experimental setup ◈ COMM application • Does not exchange data between processors. • The only resource shared here is the bus • The application uses the processors at about full power.