280 likes | 452 Views
Arm (Advance RISC Machine). Present by Pitipund Lorchirachoonkul 43650225 Uchot Jitpaisarnsook 43650373. RISC Overview. A large uniform register file A load-store architecture Simple addressing modes Uniform and fixed length instruction fields. Arm Overview.
E N D
Arm (Advance RISC Machine) Present by Pitipund Lorchirachoonkul 43650225 Uchot Jitpaisarnsook 43650373
RISCOverview • A large uniform register file • A load-store architecture • Simple addressing modes • Uniform and fixed length instruction fields
Arm Overview • Control over both the ALU and shifter • Auto-increment and auto-decrement addressing modes • Load and store multiple instructions • Conditional execution of all instructions
ARM registers • 31 registers , 32-bit: 16 are visible and other are used to speed up exception processing • Program counter (R15) • Link register (R14) • Other registers
Types of Exceptions • Two levels of interrupt • Memory aborts • Attempted execution of an undefined instruction • Software interrupts
ARM Instruction Set • Branch • Data-processing • Load and store • Coprocessor
Branch Instructions • General branch Intructions • Branch with Link • Software interrupt
Data-processing Instructions • Data-processing instructions proper • Multiply instructions • Status register transfer instructions
Load and Store Instructions • Load or store single register • Load and store multiple register • Swap a register value with the value of a memory location
Coprocessor Instructions • Data-processing instructions • Register transfers • Data-transfer instructions
The CPU Core • 5 stage pipeline • Harvard architecture • ARM v4T compliant • 110,000 transistors • TSMC 0.18mm: • 0.3 mW/MHz (1.8V) • 220MHz (1.65 V) • 1 mm2
Jazelle instruction set • ARM instruction set • Thumb instruction set • Java ByteCodes
Java ByteCodes • Directly executed bytecodes • Emulated bytecodes • Undefined bytecodes
Directly executed bytecodes 140 bytecodes executed directly in HW constant loads, (iconst_0, dconst_0, …) variable loads/stores, (iload, dstore, … ) array load/stores, (iaload, dastore, … ) integer data operations (iadd, isubb, i2b, … ) branches (ifeq, icmp_ifeq, … ) quick constant pool loads (idc_quick, … ) quick static/field operations (getfield_quick, … )
Emulated bytecodes 94 bytecodes emulated in software floating point (ddiv, dadd, dmul, … ) integer division (idiv, irem, ldiv, lrem) switch (tableswitch, lookupswitch) invoke (invokevirtual, invlkestatic, … ) return (ireturn, return, … ) new (new, newarray, … ) unresolved ldc (ldc, ldc_w, ldc2_w) unresolved field/static (getstatic, putfield, … )
Jazelle Operation New ARM instruction: 31…28 3…0 BXJ Rm Cond Rm If Condition then J = 1, PC = Rm; enters Java state and begins Byte Code execution at (Rm)
Jazelle Operation Addition of ‘J’ bit to CPSR: 31…27 24 7 6 5 4…0 Flags J I F T Mode J=0 : Processor in ARM or Thumb state (depending on T bit) J=1, T=0 : Processor in Java state
Register Re-use and Stack Optimization Use of ARM Registers in Jazelle State: R0-R3 Used to cache Java expression stack R4 Local variable 0 (‘this’ pointer) R5 Pointer to table of SW handlers R6 Java stack pointer R7 Java variables pointer R8 Java constant pool pointer R9-R11 Reserved for JVM (not used by h/w) R12, R14 Scratch usage / Java return address R13 Machine stack pointer R15 Java PC
Interrupt Behavior / Real-time performance Jazelle is Compatible with ARM Programming Conventions for Interrupt Handlers: Java Program CPSR->SPSR pc->r14 STM r13!, {reg. list} ; save regs used in ; interrupt handler LDM r13!, {reg, list} ; restore regs SUBS pc, r14, #4 ; return & restore state Interrupt Handler CPSR<-SPSR pc<-(r14-4) Java State ARM State
Competitor Comparison / Review of existing solutions Execution Performance CM/MHz Real-time System Performance Memory Cost Hardware Implementation Cost Legacy Code / RTOS support Software Emulation (SUN JDK, ARM9) 0.67 ~ 16kbyte - Yes Software Emulation (ARM JDK, ARM9) 1.7 ~ 16kbyte - Yes JIT 6.2* Poor > 100kbyte - Yes Co-processor (eg Jedi Tech, JSTAR) 2.9 - ~ 25k gates Yes Dedicated Processor 3 - 20-30k gates No ARM with architecture extensions 5.5 Excellent ~ 8kbyte ~ 12k gates Yes The only solution to meet all of the performance & application requirements. *Note: JIT performance excludes compilation overhead.