170 likes | 355 Views
資訊學院 研究能量與研發成果. Architecture and Systems 研究群 報 告 人:單智君 陳昌居 鍾崇斌 中華民國 95 年 11 月 30 日. 資訊學院研究群. 「資訊科學與工程研究所」研究群 architecture and systems 鍾崇斌、單智君、陳昌居. Architecture and Systems Research Directions. Embedded processor and SoC Java processor, JIT compilation &VM DSP designs and compilation
E N D
資訊學院研究能量與研發成果 Architecture and Systems 研究群 報告人:單智君 陳昌居 鍾崇斌 中華民國95年11月30日
資訊學院研究群 • 「資訊科學與工程研究所」研究群 • architecture and systems 鍾崇斌、單智君、陳昌居
Architecture and SystemsResearch Directions • Embedded processor and SoC • Java processor, JIT compilation &VM • DSP designs and compilation • Low-power systems • Graphic processor • Superscalar ARM processor • Reconfigurable computing • Asynchronous circuits
Architecture and SystemsR&D Results • ARM9-compatible processor with video/audio capabilities • Java stack operations folding • Memory Constrained Java Just-in-time Compiler • Asynchronous 8051 for low-power SOC applications • DSP– instruction set extensions • Low-power Branch-Target-Buffer • Low-power bus encodings • Low-power cache memory • Graphic processor design techniques • Superscalar ARM • Reconfigurable computing
ARM9-compatible Processor with Audio/Video Capabilities • ARMAVP (ARM Audio Video Processor) 為32位元微處理器,採用負載平衡良好的五階管線設計,分別為 Fetch Unit、Decoder Unit、Execution Unit、Memory access Unit 以及 Write Back Unit。對各階的設計進行效能的最佳化,以提高時脈頻率,並提供有效率的機制,降低了因為記憶體速度太慢對微處理機效能上的影響 • 特性 • 支援Conditional Execution • ABP 緩衝器設計 • 改良指令抓取所需時間 • 精確中斷控制結構 • 非同步的記憶體存取 • 動態暫存器組的映射 • 分支指令的快速處理 • 多功能有效率的執行路徑 • 分散式指令控制編碼 • 功能驗證與評估 • 所有功能已在Altera EP20K600EBC652-1上完成驗證。根據Decode Stage之模擬結果,在FPGA上可工作於45MHz,預期實做為晶片時可達210MHz
Java Stack Operations Folding JVM: Stack Based Machine JVM Performance Bottleneck: Stack Operation Dependency 1 Constant Register Constant Register Producer Producer (CR) (CR) (P) (P) Local Variable Local Variable (LV) (LV) Operand Stack Operand Stack 1‘=1 fold 2 Execution Unit Execution Unit 2 Operator Operator Branch Unit Branch Unit 3 3 (O) (O) 4 Complex Instr. Complex Instr. 5‘=4 fold 5 Consumer Consumer Local Variable Local Variable 5 (C) (LV) (C) (LV) Before Folding After Folding
Memory Constrained Java Just-in-time Compiler • Mixed mode execution • Complex bytecode is executed by the interpreter • Fast compilation • Two pass compilation • Simple but effective optimizations • About 300 cycles per bytecode • Small memory usage • About 23KB for static footprint • 4KB code buffer is sufficient for common usage
Asynchronous 8051 for Low-Power SOC Applications • SA8051 (Balsa Asynchronous8051) 為一個8位元低耗電量微控制器, 相容於Intel MCS-51,採用非同步 電路方式設計,動態耗電量約為 同步版本的三分之一。 • 特性 - 無中央時脈 - 4-phase交握的設計 - soft-core 處理器 - 低耗電量 - 透過交握介面與同步IP整合 - 針對資料與控制路徑做最佳化 • 功能驗證與評估 所有功能已在Xilinx FPGA Spartan IIE 300 ft256上完成驗證。 根據XPower之模擬結果,動態 耗電量約為同步版本的三分之一。
Register File ALU MUL LD/ST ….. ASFU Main Memory DSP– Instruction Set Extensions • Current directions • Application-specific instruction set extensions (ISE) generation • Why ISE ? • Improvement performance. • Keep flexibility and efficiency of original processor • What is ISE ? • Group frequently executed instruction patterns to be an extended instruction • Executed in extra hardware, “Application Specific Functional Unit (ASFU)”
DSP– Instruction Set Extensions (cont.) • Current research topics • Multiple-issue architecture • Exploring ISE in a multiple-issue architecture, such as superscalar or Very Long Instruction Word (VLIW) • Hardware reusebility • Reuse same or similar hardware resources in different ASFUs while keep same performance • Overcome register file read/write port constraint • Try to schedule the input and output of ASFU at different time slots
Low-power Branch Target Buffer • BTB lookup operations of non-branch instructions are useless and only waste power Branch Distance Generation and Collection 將兩相鄰分支指令間的非分支指令個數蒐集紀錄。 Branch Distance Table Next Upcoming Branch Instruction Location 取得下一道分支指令的位置並且在其來臨前停止所有BTB Lookup動作。
匯流排編碼架構 傳送端 接收端 編碼過的資料 原始資料 編碼器 解碼器 原始資料 額外控制線路 資料記憶體 處理器 指令記憶體 資料位址匯流排 T0_BI_1,Variable-Stride,SRWEC 指令位址匯流排 T0 + Discontinuous Address Table 資料匯流排 Leading-bytes encoding 指令匯流排 BIBITS with Register Relabling 處理器 記憶體 指令、位址混和之位址匯流排 I/D Selector,T0 DAT+Stride-Table 指令、位址混和之匯流排 I/D Selector,BIBITS_RR+Leading-bytes Low-power Bus Encodings • 在此我們針對不同的匯流排架構的特性,提出了不同的低電耗匯流排編碼系統。我們的編碼系統利用了各種編碼方法,將藉由匯流排傳輸的資料,以最具有電耗效率的方式來傳送,達到省電的效果。 • 低電耗匯流排編碼系統
Low-power Cache Memory • 快取記憶體佔有整體處理器超過50%之功耗 • 低功耗快取記憶體設計 • Loop Buffer: 將loop code置入低耗電存取之loop buffer中以節省指令擷取之功耗 • Power Manager:將不常使用之快取記憶體區塊置入低耗電模式以節省快取記憶體之靜態功號。
Graphic Processor 研究目的︰ 進行新一代繪圖處理器架構研究,於像素著色器 (Pixel Shader)、材質 (Texture) 及深度處理 (Depth Processing) 等三大方向提出硬體架構及軟體驗證環境。 目前成果分項說明如下︰ 3 4 2 5 6 1 • A dynamically reconfigurable graphics hardware for resource reallocatable rendering pipeline • A Reconfigurable Texture Mapping Architecture • Implementation of texture Compression by GPU Driver • Register Renaming for Pixel Shaders data/value management • Instruction scheduling mechanism for 3D GPU pixel shader • An Efficient Texture Memory System Designs • Alpha Blending without Z Sort
Superscalar ARM • Goal:a superscalar embedded processor featuring • 800MHz clock rate @ 0.13um • 1.8DMIPS / MHz – superscalar performance under tough pipeline latency • 800K gate count – cost-effective design • Directions and achievements • Micro-architecture • A 12-stage dual-issue superscalar processor with good instruction fetch rate, issue rate, and efficient forwarding • Simulator • A cycle-accurate simulator modeling more details than the well-known simplescalar simulator • Compiler • Working on GCC machine description to optimize performance
Reconfigurable Computing ( 1 / 2 ) Motivations: • Improving the Design Methodology of Embedded System Hardware • Providing a Better Performance with Low Development Cost • Shorting the Time-to-Market of SoC Products Research Issues: • Hardware/Software Partition • Synthesize Technology • Reconfigurable Processing Element Design ReconfigurableArchitecture
Reconfigurable Computing (cont.) ( 2 / 2 ) • Detailed Design of Reconfigurable Architecture A Design of Reconfigurable Architecture Scaleable Design of PE • Published Research Results: • Run-time Reconfigurable Scheduling of 3D-Rendering on a Reconfigurable System (CCCT’05) • Design and Implementation of a Reconfigurable Hardware for Secure Embedded Systems (ASIACCS’06)