Application of Binary Translation to Java Reconfigurable Architectures

Application of Binary Translation to Java Reconfigurable Architectures Antonio Carlos S. Beck Filho caco@inf.ufrgs.br Luigi Carro carro@inf.ufrgs.br Instituto de Informática - GME Universidade Federal do Rio Grande do Sul

Introduction 1 • The embedded system market is expanding 1

Introduction 1 • The embedded system market is expanding More performance is required 1

Introduction 1 • Moreover… • Shorter Design cycle • The complexity of these embedded systems is increasing as well • Battery dependent 2

Introduction 1 These embedded systems are adopting Java • Devices with Java as cellular phones and PDAs: • 176 million in 2001 • 721 million in 2006 [1] • 80% of cellular phones will support Java [2] • 10 times more embedded system developers than general-purpose software ones by the year 2010 [3] [1] D. Takahashi, Java Chips Make a Comeback, Red Herring, 2001 [2] G. Lawton, “Moving Java into Mobile Phones”, Computer, vol. 35, n. 6, 2002, pp. 17-20 [3] R.W. Atherton, “Moving Java to the Factory”. IEEE Spectrum, 1998, pp. 18-23, 3

Introduction 1 • The Java Language... • Object Oriented • Modeling • Programation • Validation • Widely spread • Safe • Small size of ROM memory (CISC) • Multiplataform 4

Motivation 2 • How to increase the performance with low power consumption? 5

Motivation 2 • How to increase the performance with low power consumption? • Using a reconfigurable array! 5

Motivation 2 • How to increase the performance with low power consumption? • Using a reconfigurable array! Special tools and compilers are needed! 5

Motivation 2 • How to increase the performance with low power consumption? • Using a reconfigurable array! Special tools and compilers are needed! No software portability! And the design cycle? 5

Outline 3 • Java processors • Using Binary Translation with reconfigurable arrays • The reconfigurable array • Results • Area • Performance • Power consumption • Conclusions and Future Work 6

Femtojava Low-Power 4 7

Femtojava Low-Power 4 • Five stages: Instruction Fetch Operand Fetch Write Back Decoder Execution 8

Femtojava Low-Power 4 IADD Instruction Fetch Operand Fetch Write Back Decoder Execution • With a instruction queue of 9 bytes long to handle with variable size instructions 8

Femtojava Low-Power 4 IADD 11011… Instruction Fetch Operand Fetch Write Back Decoder Execution • Responsible for the generation of the microOPs and for checking data dependence 8

Femtojava Low-Power 4 4 4 POP Top of Stack 2 2 7 8 3 9 Instruction Fetch Operand Fetch Write Back Decoder Execution • It has a register bank with two ports • Stack and local variable storage implemented in this register file 8

Femtojava Low-Power 4 4 4 POP Top of Stack 2 2 7 8 3 9 Instruction Fetch Operand Fetch Write Back Decoder Execution • It has a register bank with two ports • Stack and local variable storage implemented in this register file Allows comparisons with RISC machines! 8

Femtojava Low-Power 4 4 + 2 = 6 Instruction Fetch Operand Fetch Write Back Decoder Execution • Six functional units: multiplier, ALU, shifter, constant generator, branch and LD/ST 8

Femtojava Low-Power 4 6 Top of Stack 7 8 3 9 Instruction Fetch Operand Fetch Decoder Execution Write Back • Write the results back to the stack or local variable storage 8

VLIW Architecture 5 • 2 instructions/VLIW packet: Instruction 2 Instruction 1 Instruction Fetch Operand Fetch Write Back Decoder Execution • VLIW packet has a variable size • In this case, The VLIW packet can have 1 or 2 instructions/packet 9

VLIW Architecture 5 Instruction 1 11011… Decoder 1 Instruction Fetch Operand Fetch Write Back Execution Decoder 2 Instruction 2 11011… • Decoder 2 doesn’t support calls and return of methods 9

VLIW Architecture 5 Register Bank 2 4 OperandStack 2 7 Register Bank 1 OperandStack 8 6 Local Variable Pool 3 1 9 Instruction Fetch Operand Fetch Write Back Decoder Execution • Each flow has its own operand stack • The local variable pool of the method is shared No mechanism is necessary for communication among the flows! 9

VLIW Architecture 5 Instruction Fetch Operand Fetch Write Back Decoder Execution • Six functional units: multiplier, ALU, shifter, constant generator, branch and LD/ST • They are replicated in each flow 9

VLIW Architecture 5 Instruction Fetch Operand Fetch Decoder Execution Write Back • Write the results back to the operand stack of each flow OR to local variable storage of the 1st register bank 9

Why use a reconfigurable array? • Hypothesis: substitution of a sequence of instructions by a combinational circuit saves power (we loose area) • Let us see the multiplication algorithm example • TCalg = n*(TPFF+n*T+Tset) • TCCC = n* n*T (very pessimistic)

The Binary Translation 6 • BT: take a binary code and produce another binary for a different machine • BT advantages when used with reconfiguration: • One can detect paralelism and reconfigure the array at run-time • No need for special tools or compilers anymore! • We solve the sw-compatibility problem 10

The Binary Translation 6 • How it works? • Observe the bytecodes looking for frequently executed sequences • Save this sequence in a special cache • When this sequence of instructions is found again, the array is reconfigured and set as active functional unit 10

Bytecodes Detection 7 Bipush 10 Bipush 5 Imul Bipush 3 Bipush 4 Ishl Iadd Istore Bipush 6 Bipush 7 imul Considering these bytecodes 11

Bytecodes Detection 7 Bipush 10 Bipush 5 Imul Bipush 3 Bipush 4 Ishl Iadd Istore Bipush 6 Bipush 7 imul 11

Bytecodes Detection 7 Bipush 10 Bipush 5 Imul Bipush 3 Bipush 4 Ishl Iadd Istore Bipush 6 Bipush 7 imul The instructions depend on each other! 11

Bytecodes Detection 7 Bipush 10 Bipush 5 Imul Bipush 3 Bipush 4 Ishl Iadd Istore Bipush 6 Bipush 7 imul 11

Bytecodes Detection 7 Bipush 10 Bipush 5 Imul Bipush 3 Bipush 4 Ishl Iadd Istore Bipush 6 Bipush 7 imul These two blocks are independent !!! 11

Bytecodes Detection 7 Bipush 10 Bipush 5 Imul Bipush 3 Bipush 4 Ishl Iadd Istore Bipush 6 Bipush 7 imul Operand Block 1 – First Sequence Operand Block 2 – Second Sequence 11

The Reconfigurable Array 8 • The array is coarse-grain • It allows to save a great number of sequences in the cache • The reconfiguration is fast 12

The Reconfigurable Array 8 • The array is coarse-grain • It allows to save a great number of sequences in the cache • The reconfiguration is fast • It is formed by one or more basic cells • With one multiplier and a sequence of seven sets of basic functional units 13

General Overview 9 Reconfiguration Cache Array . . . Detector Unit 14

Power Simulator 10 • CACO-PS • Cycle AccurateCOnfigurablePower Simulator • Based on the switching activity • Pd = α . fc . C . Vdd² • Result is given in number of gate capacitances that switch 15

Results 11 • A set of algorithms were executed in the architectures • Sin Calculation • Sort – Bubble • Sort – Select • Sort – Quick (10 and 100 elements) • Search – Binary • Search – Sequential • IMDCT (plus three unrolled versions) • Floating Point Sums emulation • Full MP3 PLAYER 16

Performance 11 17

Performance 11 The same number of different sequences of instructions 17

Performance 11 Parallelism exposed by loop unrolling 17

Performance 11 No more parallelism available! 17

Performance 11 There is room for improvement! 17

Performance 11 Compare these two and you can save reconfiguration memory 17

Application of Binary Translation to Java Reconfigurable Architectures

Application of Binary Translation to Java Reconfigurable Architectures

Presentation Transcript

Emulation - Binary Translation

Application architectures

Application Architectures

QEMU Binary Translation

Course-Grained Reconfigurable Architectures

Dynamically Reconfigurable Architectures: An Overview

REGIMap: Register-Aware Application Mapping on Coarse-Grained Reconfigurable Architectures

Emulation - Binary Translation

Dynamic Binary Translation

Reconfigurable Architectures

Binary Translation

Application Architectures

Reconfigurable Architectures

Binary Translation

Reconfigurable Architectures

Binary Translation and Applications

Reconfigurable Architectures

Application Architectures

Recommendations for Java-Based Web Application Architectures

Reconfigurable architectures

Application architectures

Application Architectures