450 likes | 467 Views
Explore innovative research and development in embedded operating systems, wireless networks, real-time applications, and hardware-software co-design. Discover advancements in multimedia technology, low-power mobile devices, and efficient storage systems. Dive deep into the world of SOC design, network architecture, and wireless communication protocols.
E N D
SOC & Embedding System Group • Embedding System • Embedded OS – 曾建超 • Multimedia – 蔡淳仁, 蔡文錦 • Low power mobile – 曹孝櫟 • Storage – 張立平 • SOC Design & CAD • Network – 林盈達 • Architecture and Systems – 鍾崇斌、單智君 • Wireless base-band Processor – 許騰尹 • Multimedia SOC Design – 蔡淳仁, 彭文孝 • Electrical Design Automation – 李毅郎
Research Interests Chien-Chao Tseng 曾建超 網路工程研究所 系統設計研究所 交通大學資訊學院 cctseng@csie.nctu.edu.tw
Wireless Access to Internet • My Interests :-) Roaming and Handoffs • 3G/GPRS/PHS • WiMax/WLAN/Bluetooth/PAN • HeterogeneousWireless Overlay Networks • Multi-interfaceHandheld Devices
Embedded OS for Multi-interface Handheld Devices • Cross-layer design for Real-time Applications • Linux/Windows XP/CE • Driver, Network, and Application Layers (VoIP) • Heterogeneous Wireless Networks • WLAN/WiMax/3G/GPRS/PHS • Roaming and Handovers • Multi-tier Wireless Network • WPAN, WLAN and Mobile Router • Roaming and Routing • Wireless Mesh and Sensor Networks • Address Assignment and Routing • Secured and Fast Accesses to Wireless Network 3G/GPRS PHS Embedded WLAN
Embedded Systems (曹孝櫟助理教授)Research Directions • Embedded Software for B3G/4G Mobile Devices • Protocol Stacks for 4G access • Embedded Operating System and Device Drivers, and their Optimization for Mobile Devices • Cooperate with international and local vendors and institutes to development 4G/multimode radio SOC • Establish the reference embedded software for next generation mobile devices/Radio SoCs
Embedded Systems (曹孝櫟助理教授) R&D Results - Low Power and Fast Handover Cellular/WLAN Dual Model Mobiles Power Consumption Evaluation System Architecture and Prototype of Cellular/WLAN Dual Mode Mobile Handover Latencies Evaluation Awarded by 2005 Mobile Communications Contest of Industrial Development Bureau, MOEA 2005 Software Contest of National Center of High-Performance Computing 2006 Embedded Software Contest of MOE
Prof. Li-Pin Chang 張立平 • Recent research directions • Embedded storage systems • Real-time systems and scheduling algorithms • Hardware-software co-design
Embedded Storage: Efficient wear-leveling algorithm for flash memory • To capture uneven usages from millions of blocks and to level them • Result: the most fast, effective, economic approach available!! Worn-out quickly! Erase cycle # LBA Time Block # Access pattern Block usage
( 1 , 4 ) t t drop i i drop FE (( 4 , 7 ) , 2 , 4 ) t ' j t " j ( 1 , 4 ) PP ( 2 , 7 ) Real-Time Systems:Overload Management for Real-Time Object Tracking Inter-arrival time of frames : 4ms. Workload-scaling factor: 4/7 (57%) Proportional Adjustment: (c,4)(c,7) Firm-real-time: (c,4)((4,7),c,4) Average RMS error Average RMS error
Hardware-Software Co-designReconfigurable computing for overload management • Reconfigurable computing for overload management • Past achievement: • Overload management for event-driven real-time embedded systems • Working-in-progress: • To deal with transient workload bursts with hardware acceleration • Move critical tasks onto FPGA • Computing resource reclamation • On-line floor planning • On-line topology reconfiguration for network-on-chip (NoC)
Embedded Systems (蔡文錦)- Research Directions • Low-power embedded systems • Video compression/decompression
Plan in the near future • Low-power AVC/H.264 video CODEC algorithm and system design
Multimedia Embedded Systems Lab (蔡淳仁) – Research Directions • SoC Design for Advanced Video Codecs • DVB/MHP middleware & Java Runtime • Java Processor for DVB/MHP • Flexible Multimedia Codec SoC Platforms • OS Kernel Scheduler for Tightly-coupled Heterogeneous Multi-core Platforms
Multimedia Embedded Systems LabR&D Results • H.264 Codec Accelerators on ARM Integrator • Java Processor Accelerating Technologies on Spartan 3 and ML-310 Platforms (based on the open source JOP project) • Video Rate Control for HW/SW Co-designed SoCs (patent application) • Tightly-coupled H.264 encoder on TI-OMAP 5912 • Tightly-coupled kernel scheduler module for ARM-Linux on TI-OMAP 5912
Future Plans • Implementation a flexible multimedia codec SoC platform • Design of a new Java Processor for DVB/MHP • Design of Hardware-Friendly Psychovisual-models for Video Codec • Clean Design of a Multi-core OS kernel suitable for Tightly-Coupled Task Scheduling
Architecture and SystemsResearch Directions (單智君 鍾崇斌) • Embedded processor and SoC • Java processor, JIT compilation &VM • DSP designs and compilation • Low-power systems • Graphic processor • Superscalar ARM processor • Reconfigurable computing
Architecture and SystemsR&D Results • ARM9-compatible processor with video/audio capabilities • Java stack operations folding • Memory Constrained Java Just-in-time Compiler • DSP– instruction set extensions • Low-power Branch-Target-Buffer • Low-power bus encodings • Low-power cache memory • Graphic processor design techniques • Superscalar ARM • Reconfigurable computing
ARM9-compatible Processor with Audio/Video Capabilities • ARMAVP (ARM Audio Video Processor) 為32位元微處理器,採用負載平衡良好的五階管線設計,分別為 Fetch Unit、Decoder Unit、Execution Unit、Memory access Unit 以及 Write Back Unit。對各階的設計進行效能的最佳化,以提高時脈頻率,並提供有效率的機制,降低了因為記憶體速度太慢對微處理機效能上的影響 • 特性 • 支援Conditional Execution • ABP 緩衝器設計 • 改良指令抓取所需時間 • 精確中斷控制結構 • 非同步的記憶體存取 • 動態暫存器組的映射 • 分支指令的快速處理 • 多功能有效率的執行路徑 • 分散式指令控制編碼 • 功能驗證與評估 • 所有功能已在Altera EP20K600EBC652-1上完成驗證。根據Decode Stage之模擬結果,在FPGA上可工作於45MHz,預期實做為晶片時可達210MHz
DSP– Instruction Set Extensions • Current research topics • Multiple-issue architecture • Exploring ISE in a multiple-issue architecture, such as superscalar or Very Long Instruction Word (VLIW) • Hardware reusebility • Reuse same or similar hardware resources in different ASFUs while keep same performance • Overcome register file read/write port constraint • Try to schedule the input and output of ASFU at different time slots
匯流排編碼架構 傳送端 接收端 編碼過的資料 原始資料 編碼器 解碼器 原始資料 額外控制線路 資料記憶體 處理器 指令記憶體 資料位址匯流排 T0_BI_1,Variable-Stride,SRWEC 指令位址匯流排 T0 + Discontinuous Address Table 資料匯流排 Leading-bytes encoding 指令匯流排 BIBITS with Register Relabling 處理器 記憶體 指令、位址混和之位址匯流排 I/D Selector,T0 DAT+Stride-Table 指令、位址混和之匯流排 I/D Selector,BIBITS_RR+Leading-bytes Low-power Bus Encodings • 在此我們針對不同的匯流排架構的特性,提出了不同的低電耗匯流排編碼系統。我們的編碼系統利用了各種編碼方法,將藉由匯流排傳輸的資料,以最具有電耗效率的方式來傳送,達到省電的效果。 • 低電耗匯流排編碼系統
Low-power Cache Memory • 快取記憶體佔有整體處理器超過50%之功耗 • 低功耗快取記憶體設計 • Loop Buffer: 將loop code置入低耗電存取之loop buffer中以節省指令擷取之功耗 • Power Manager:將不常使用之快取記憶體區塊置入低耗電模式以節省快取記憶體之靜態功號。
Graphic Processor 研究目的︰ 進行新一代繪圖處理器架構研究,於像素著色器 (Pixel Shader)、材質 (Texture) 及深度處理 (Depth Processing) 等三大方向提出硬體架構及軟體驗證環境。 目前成果分項說明如下︰ 3 4 2 5 6 1 • A dynamically reconfigurable graphics hardware for resource reallocatable rendering pipeline • A Reconfigurable Texture Mapping Architecture • Implementation of texture Compression by GPU Driver • Register Renaming for Pixel Shaders data/value management • Instruction scheduling mechanism for 3D GPU pixel shader • An Efficient Texture Memory System Designs • Alpha Blending without Z Sort
Superscalar ARM • Goal:a superscalar embedded processor featuring • 800MHz clock rate @ 0.13um • 1.8DMIPS / MHz – superscalar performance under tough pipeline latency • 800K gate count – cost-effective design • Directions and achievements • Micro-architecture • A 12-stage dual-issue superscalar processor with good instruction fetch rate, issue rate, and efficient forwarding • Simulator • A cycle-accurate simulator modeling more details than the well-known simplescalar simulator • Compiler • Working on GCC machine description to optimize performance
Reconfigurable Computing ( 1 / 2 ) Motivations: • Improving the Design Methodology of Embedded System Hardware • Providing a Better Performance with Low Development Cost • Shorting the Time-to-Market of SoC Products Research Issues: • Hardware/Software Partition • Synthesize Technology • Reconfigurable Processing Element Design ReconfigurableArchitecture
Research overview in SOC and Embedded Systems (林盈達) • Research theme: • Content networking with deep packet inspection by software and hardware solutions; with applications in Internet security (intrusion detection, anti-virus, anti-spam, content filtering, MSN/P2P management) • Embedded software • Embedded Linux solutions: 7-in-1 10-in-1 • A startup company, L7 Networks (L7-Networks.com), 2002, for all-in-one security gateways • SoC • Key component in content networking: string matching hardware acceleration needed! • FPGA-based development to accelerate Aho Crosaic and Bloom Filtering algorithms
Embedded and SoC GroupSelected R&D Results (2/2) • 7-in-1 integrated security gateway • String Matching Engine to Accelerate Aho Corasic Machine • Unified Content Filtering Hardware Platform • String Matching Hardware with Bloom Filters
LAN/DMZ WAN LAN/DMZ to WAN Outbound Traffic MAC Filter Redirect In-LAN Filter Policy Route Route Out-WAN Filter NAT IPsec VPN Bandwidth Mgt. Y Y Y FTP/POP3/SMTP/ Web/URL Filter with Many-to-One NAT sniff Alerting System Intrusion Detection Y Y Out-LAN Filter Route Bandwidth Mgt. In-WAN Filter Redirect deNAT IPsec deVPN 7-in-1 Integrated Security Gateway • 7-in-1: VPN, Firewall, NAT, Routing, Content Filtering, Intrusion Detection, Bandwidth Management • Launched a startup in 2002: L7 Networks Inc. WAN to DMZ/LAN Inbound Traffic
String Matching Engine to Accelerate Aho Corasic Machine • New Parallel Architecture with Pre-Hashing and Root-Indexing
Unified Content Filtering HardwarePlatform • Resolve content filtering issues • Match without interrupt CPU • Multiple connections management • On-fly match non -fixed payload • Multiple patterns and multiple matched outputs Content Filtering Hardware Text ID Status Length First Matched Offset Last Match Offset Text Pointer FA State Text ID Status Length First Matched Offset Last Match Offset Text Pointer FA State . . . Text ID Status Length First Match Offset Last Match Offset Text Pointer FA State Text Descriptors in DRAM
String Matching Hardware with Bloom Filters shift controller Feature Set: 1. Allow maximum shift distance if possible. 2. Reconfigure rules easily. 3. Keep constant hardware complexity. Leaving byte Entering byte Bloom filter(1) Bloom filter(2) Bloom filter(3) Platform: Xilinx ML310 Embedded Development Platform with embedded PowerPC 405 processor Xilinx Virtex-II Pro XC2VP30 FPGA MontaVista Linux Professional Edition 3.0 detect prefix(p,1) detect prefix(p,2) detect factor in p
Embedded and SoC GroupMajor Projects • Excellence Project: Next Generation Information Communication Networks (卓越後續計劃, 國科會2004~2008): • 林盈達,曾文貴 (with 24 faculty members) • Network Benchmarking Lab (工研院交大網路測試中心, www.nbl.org.tw, 經濟部工業局, 2003~2007) • 林盈達 • Attack Session Extraction and Comparison with Nessus (Cisco San Jose, 2005~2006) • 林盈達 • Content-based Network Security - Content Classification: Design, Implementation, and Evaluation (整合型計劃, 國科會, 2004~2006) • 林盈達 (with 李程輝, 孫雅麗) • Open Source Product Testing Tools: In-Lab Live Testing (國科會, 2005~2006) • 林盈達
Biography of Ying-Dar Lin 林盈達 • Areas of research interests • Design, implementation, analysis, benchmarking of Internet gateway devices (10-in-1: routing, NAT, firewall, VPN, IDP, CF, anti-virus, anti-spam, IM, P2P, bandwidth management, link load balance, etc.) • Internet security and QoS • Content networking • Test technologies of switch, router, WLAN, security, and VoIP • Publications • International journal: 39 • International conference: 33 • IETF Internet Draft: 1 • Industrial articles: 124 • Books: 2 • Patents: 16 • Tech transfers: 8 • B.S., NTU-CSIE, 1988 • Ph.D., UCLA-CS, 1993 • Professor, NCTU-CS, 1999~ • Founder and Director, ITRI-NCTU Network Benchmarking Lab (NBL; www.nbl.org.tw), 2002~ • Co-Founder, L7 Networks Inc. (www.L7.com.tw), co-invested by D-Link, ZyXEL, and Advantech, 2002 • Consultant, CCL/ITRI, 2002~ • Well-cited paper: Multihop Cellular: A New Architecture for Wireless Communications, INFOCOM 2000, YD Lin and YC Hsu; # of citations: 150
Wireless Baseband Processor (許騰尹) • MIMO OFDM PHY • Ultra Low-power PHY • Generic PHY architecture • Chip Implementations
Wireless Baseband Processor Spreading Gate Count :500 Max. Freq : 80MHz PAM Match Filter Gate Count :4800 Max. Freq : 80MHz Spreading PAM Match Filter Clock Generator Gate Count :2600 Max. Freq : 165MHz CTRL Gate Count :1500 Max. Freq : 80MHz Clock Generator CTRL Clock Recovery Divider Clock Recovery Gate Count :1500 Max. Freq : 178MHz Digital Divider Gate Count :900 Max. Freq : 60MHz
Proto-type 802.11b Baseband+MAC chip D/A A/D (Q) A/D (I) PLL
Architecture and SystemsR&D Results • ARM9-compatible processor with video/audio capabilities (technology transferring) • Java stack operations folding (patents) • Asynchronous 8051 on FPGA • Low-power Branch-Target-Buffer (patent application) • Low-power bus encodings (patent applications) • Graphic processor design techniques
SOC Electrical Design Automation (李毅郎) – Research Directions • Reliable Interconnect Design • Crosstalk-driven Interconnect Design • Design-for-Manufacture (DFM) Interconnect Design • Layout Migration • VLSI Cell Migration with Topology Preservation • Post-Layout Platform for Verification and Optimization
SOC Electrical Design Automation– RD Results • Tile-based Gridless ECO Router with Graph Reduction • Two times faster than existing tile-based routers. • NEMO: A New Full-Chip Gridless Router • Faster than all academic gridless routers • Crosstalk-driven Track Assignment • Pre-Detailed Routing Design Flow Considering Capacitive- and Inductive-Noise Constraints
Electronic System Level Design (彭文孝) http://mapl.nctu.edu.tw Traditional Design Flow Design Flow with ESL System Level Verification and Integration System Level Verification and Integration First Time Silicon Success
Design Practice: Transaction Level Modeling for H.264 Decoder(彭文孝) http://mapl.nctu.edu.tw Bus Arbitration Cache SDRAM Controller Video Pipe Data Transaction Control Bus Output Interface CPU
SoC for Multi-Standard Video Codec (彭文孝) http://mapl.nctu.edu.tw Video Codec Color Transform HD Capturing Embedded SRAM and Ob-Chip Bus System on Chip Networking Bus Arbitration ARM-9 CPU 3-A Functionalities Architecture C Model
VLSI/SOC Research for Graphics System (范倫達老師) VLSI Information Processing LAB Advisor: Lan-Da Van (ldvan@cs.nctu.edu.tw) 3-D Graphics Demo Here!
Block diagram of platform Memory location (size) AddrBits / DataBits AHB ROM 0x0 (0x100000) 20 / 32 SW ARM926 32 / 32 20 / 32 Instruction stub RAM 0x400 0000 (0x100000) 32 / 32 Data clock FFT HW 0x1000 0000 (0x4) APB reset iTCM dTCM 1 / 8 din 0xc000 0000 (0x1) Display VLSI/SOC Research for Adaptive Communications (范倫達老師) • 虛擬系統單晶片平台(Virtual SOC Platform)建置 – 使用CoWare Platform Architect • 提供虛擬系統平台供軟體人員程式開發 • 提升系統模擬之層級以提高系統驗證效率 • 發展效能評估指標: 根據效能評估指標的模擬結果進而得到系統架構的最佳配置,以供系統開發時有所依據 • 在不同的軟硬體組態,模擬各功能函數所花費的時間 • 在不同的軟硬體組態,計算模組對bus之進行存取次數 IP Implementation RAM FFT/IFFT Chip Design Virtual SOC Verification Platform