560 likes | 683 Views
An Introduction to Electronic System Level Design. 錢偉德 國家晶片及系統中心設計服務組 清大資工系視訊通訊研究室. Trend. Fashion Driven Applications Question 1: how do we push out a brand new application every 3 months? Question 2: How do we help our customers do application driven SoC ’ s?. HW Problems.
E N D
An Introduction to Electronic System Level Design 錢偉德 國家晶片及系統中心設計服務組 清大資工系視訊通訊研究室
Trend • Fashion Driven Applications • Question 1: how do we push out a brand new application every 3 months? • Question 2: How do we help our customers do application driven SoC’s?
HW Problems • HW is getting more complicated: • Multiple processors/autonomous engines for parallelism • Sophisticated algorithms for acceleration • High throughput and low latency • Management of dynamic and static power • Smaller chip size
SW Problems • SW becomes a massive task: • SW/HW engineer ratios: • Multimedia – 2:1 • Networking --3:1 • Wireless – 4:1 • Need to ensure HW spec is what they want. • Need to program for the complicated HW. • 80% design is determined when 20% into the project. So better do it earlier.
Design Team • Hardware Team • Components, devices, memory • Glue logic, clock tree, bus, PLL, etc. • FW/SW Team • Device drivers • RTOS, application porting • System Team • Application/algorithm analysis • Architecture design
System Team • Comprehend the system at transaction level • Application oriented • It is good to understand hardware designing, but it is not a must-to-have. • Solve big problems at the design phase, not the verification phase
Algorithm Design & Analysis Matlab Dataflow Analysis Simulink, SPW, ADS, etc. Architecture Design System Verification Constraints HW Implementation SW/FW Implementation System Integration System Design Flow
Algorithm & Architecture Design • Algorithm Design • Dataflow Analysis • Memory access • Low-power • Architecture Design • Memory infrastructure • Bus architecture • IP Reuse • Cache/DMA • Multi-Vdd/Multi-Frequency • Platform design • Performance evaluation • Multi-Core SoC
Design Flow • Algorithm Design • Architecture Design • Cycle-Accurate System Modeling • Transaction-Level and Cycle-Accurate Modeling • RTL Design • High-Level Synthesis • FPGA Implementation • Logic Synthesis • Place & Route • Signal Integrity/IR Drop
Typical Project Schedule Time to Market System Design Hardware Design Prototype Build Hardware Debug Software Design Software Coding Software Debug Project Complete
System Design Hardware Design Prototype Build Hardware Debug Software Design Software Coding Software Debug Project Complete Debug starts on a Co-verification Env. • Integrate earlier • Debug SW sooner • Iterate changes faster • Reduce project risk • Early architecture closure reduces risk by 80% • Start software development 6 months earlier HW/SW Co-design Benefits
Simulation Speed Issue • To be categorized as a system-level language, the simulation SPEED is the key. • The simulation speed should take no 1,000 time slower than the real HW. In another word, 1 second of HW execution time equals 16 minutes and 40 seconds simulation time. • To achieve this kind of performance, the system is best modeled in transaction level.
Solution: Virtual Platform • High-Speed Simulation • SystemC-Based Models • Transactional Level Modeling Methodology • Abstraction Levels range from Programmer’s View to Cycle-Accurate
C, C++ System-level Modeling Done Verilog/VHDL Simulation & Analysis Simulation Results Synthesis To tape out, test and product delivery Current System Design Methodology Refine
Abstraction Level of Hardware Models * Willamette HDL, Inc.
Layered Libraries Verification Library, TLM Library, etc. Methodology-Specific Libraries Master/Slave Library, etc. Primitive Channels Signal, Mutex, Semaphore, FIFO, etc. Data Types 4-valued Logic Type 4-valued Logic Vectors Bits and Bit Vectors Arbitrary Precision Integers Fixed-Point Types Core Language Modules Ports Interfaces Channels Event-Driven Simulation Events, Processes C++ Language Standards SystemC 2.1 Language Architecture(IEEE 1666)
SystemC & C++ • SystemC is a set of C++ class definitions and a methodology for using these classes. • C++ class definition means systemc.h and the matching library. • Methodology means the use of simulation kernel and modeling. • You can use all of the C++ syntax, semantics, run time library, STL and such. • However you need to follow SystemC methodology closely to make sure the simulation executes correctly.
SystemC & HDL • SystemC is a Hardware Description Language (HDL) from system-level down to gate level. • Modules written in traditional HDLs like Verilog and VHDL can be translated into SystemC, but not vise versa. Reason: Verilog and VHDL do not support transaction-level. • System-Verilog is Verilog plus assertion, which is an idea borrowed from programming languages. And SystemC supports assertion as well through the C++ syntax and semantics.
SystemVerilog vs. SystemC • SystemVerilog is Verilog plus verification (assertion). • Actually the above statement is not fair but it is the truth now. • SystemVerilog and SystemC work together to complete the design platform from system-level to gate-level. • SystemC deals with whatever above RTL. • SystemVerilog deals with RTL and below.
Simulation Speed Up CPU: Pentium IV 1.6GHz RAM: 512MB OS: RedHat Linux 8.0 ConvergenSC Ver. 2005.1.1 NC-SIM Ver. 4.0 Carbon Ver. C2006.04 SP2
Display Controller Input JPEG Single layer AHB AMBA bus ARM926EJ-S Core AHB External Memory XB Static Memory Interface Dual Master Port DMA Controller Int. ROM Instruction ROM ARM Core Int. RAM Data RAM0 IRQ RAM1 FIQ SMI AHB APB Display Ctrl AHB2APB APB_cfg Slave DMA Ctrl Master1 APB Input Device Master2 Interrupt Ctrl DMA_Int Clock Gen. Reset Ctrl Interrupt Controller APB AMBA bus Configuration 1
AHB External Memory XB Int. ROM Instruction ROM ARM Core Int. RAM Data RAM0 IRQ RAM1 FIQ SMI AHB APB Display Ctrl AHB2APB APB_cfg Slave DMA Ctrl Master1 APB Input Device Master2 Interrupt Ctrl DMA_Int Clock Gen. Reset Ctrl
AHB External Memory XB Int. ROM Instruction ROM ARM Core Int. RAM Data RAM0 IRQ RAM1 FIQ SMI AHB APB Display Ctrl AHB2APB APB_cfg Slave DMA Ctrl Master1 APB Input Device Master2 Interrupt Ctrl DMA_Int Clock Gen. Reset Ctrl
AHB APB Display Ctrl AHB2APB APB_cfg APB Input Device Interrupt Ctrl AHB External Memory XB Int. ROM Instruction ROM ARM Core Int. RAM Data RAM0 IRQ RAM1 FIQ SMI Slave DMA Ctrl Master1 Master2 DMA_Int Clock Gen. Reset Ctrl
AHB APB Display Ctrl AHB2APB APB_cfg APB Input Device Interrupt Ctrl AHB External Memory XB Int. ROM Instruction ROM ARM Core Int. RAM Data RAM0 IRQ RAM1 FIQ SMI Slave DMA Ctrl Master1 Master2 DMA_Int Clock Gen. Reset Ctrl
AHB External Memory XB Int. ROM Instruction ROM ARM Core Int. RAM Data RAM0 IRQ RAM1 FIQ SMI AHB APB Display Ctrl AHB2APB APB_cfg Slave DMA Ctrl Master1 APB Input Device Master2 Interrupt Ctrl DMA_Int Clock Gen. Reset Ctrl
AHB External Memory XB Int. ROM Instruction ROM ARM Core Int. RAM Data RAM0 IRQ RAM1 FIQ SMI AHB APB Display Ctrl AHB2APB APB_cfg Slave DMA Ctrl Master1 APB Input Device Master2 Interrupt Ctrl DMA_Int Clock Gen. Reset Ctrl
Bus Contention Analysis Notice there is always buscontention Max What is this activity? And this? Average There are several different problems. Let’s start by zooming in on a suspected DMA transfer. This is probably DMA input to external memory
Max Average So our next step is to examine the transaction counts by initiators and targets in this time period. During these time intervals there is contention approximately 60 and 90 percent of the time
Bus Utilization Utilization is down Target Count AHB to Int ROM APB to AHB AHB to Int RAM AHB to Ext Mem Initiator Count ARM to ROM APB to DMA Master2 ARM to RAM DMA Master1 to Ext Mem
Initiator Avg DMA Master1 to Ext Mem APB to DMA Master2 Utilization Avg Target Avg APB to AHB AHB to Int ROM Write time Read time AHB to Int RAM AHB to Ext Mem
We have confirmed that: • It is DMA activity • There is contention approximately 60 to 90% of the time over these time intervals • The DMA is contending with the CPU for AHB access • We determined the transaction counts and their duration our next step is to examine the activity in this time period. DMA Activity
In these views we are zoomed in at the very beginning of the time period of interest. Max Average We can see the increase in activity. During these time intervals there is contention approximately 33 percent of the time our next step is to examine the target and initiator counts.
Bus Utilization Target Count AHB to Int ROM AHB to Int RAM Initiator Count An increase in CPU to RAM activity due to... ARM to IntROM ARM to IntRAM
Bus Contention Average Function Trace Increase due to the initializaiton of memory allocation (these are low level routines called from the process_sos function as it is preparing for the Huffman decoding activity)
The contention and utilization problems are primarily due to: • An increase of CPU to RAM activity • There is contention approximately 33% of the time over this time interval • The software activity is the initialization of memory allocation Increased CPU to RAM activity Our next step is to examine the activity in this time period.
Bus Utilization Target Count Initiator Count ARM to RAM ARM to Display Primarily an increase in ARM to ROM activity
The contention and utilization problems are primarily due to: ARM core to ROM and RAM activity Dual DMA activity AHB External Memory XB Int. ROM Instruction ROM ARM Core Int. RAM Data RAM0 AHB IRQ RAM1 APB FIQ Display Ctrl SMI AHB2APB APB_cfg APB Input Device Slave Interrupt Ctrl DMA Ctrl Master1 Master2 DMA_Int Clock Gen. Reset Ctrl Next, examine two possible solutions.
AHB APB Display Ctrl AHB2APB APB_cfg APB Input Device Interrupt Ctrl Configuration 2 External Memory XB Bus Matrix Instruction in0 ROM ARM Core I out0 Int. ROM O Int. RAM Data RAM0 IRQ in1 out2 RAM1 I FIQ O SMI AHB4 AHB3 Slave out1 DMA Ctrl O Master1 in2 I Master2 AHB2 DMA_Int Clock Gen. Reset Ctrl Input and Output Stages
Configuration 3 External Memory XB Bus Matrix Instruction in0 ROM I ARM Core out0 Int. ROM O in1 Data RAM0 I AHB2 IRQ RAM1 Int. RAM FIQ SMI Bus Matrix AHB in2 APB out1 Display Ctrl I Slave O DMA Ctrl in3 APB_cfg Master1 I out2 O in4 Master2 I APB Input Device DMA_Int Interrupt Ctrl Clock Gen. Reset Ctrl Input and Output Stages
Bus Contention Analysis of Three Configurations CPU to Memory Contention DMA Contention Configuration 1 Single AHB Configuration 2 No CPU to Memory Contention Less DMA Contention 3 AHB with 1 Multi-layer Configuration 3 No Bus Contention Single AHB with 2 Multi-layers
AHB External Memory XB Int. ROM Instruction ROM ARM Core Int. RAM Data RAM0 IRQ RAM1 FIQ SMI AHB APB Display Ctrl AHB2APB APB_cfg Slave DMA Ctrl Master1 APB Input Device Master2 Interrupt Ctrl DMA_Int Clock Gen. Reset Ctrl Cache Analysis We begin with the cache disabled and examine the software execution.
IDCT activity Huffman Decoding
The ARM core does not have the cache enabled. We will build a system with cache enabled and compare the results. Large number of accesses to the Internal ROM
Cachedisabled Cacheenabled We will now compare the Master to Slave Access views.
Cachedisabled Cacheenabled Notice the striking reduction in ROM access.