430 likes | 573 Views
t-kernel Tutorial (Session A) July 6, 2006. Lin Gu Dept. of Computer Science University of Virginia. lingu@cs.virginia.edu http://www.cs.virginia.edu/~lg6e. Outline. Overview Code modification and transitions Bridging Re-visit the code modification Differentiated virtual memory.
E N D
t-kernel Tutorial(Session A)July 6, 2006 Lin Gu Dept. of Computer Science University of Virginia lingu@cs.virginia.edu http://www.cs.virginia.edu/~lg6e
Outline • Overview • Code modification and transitions • Bridging • Re-visit the code modification • Differentiated virtual memory
Application domain Targeting application systems with very-low-power, networked, and unattended operations By Duane Birkey, ecuadorphotos.tripod.com
Goals Design a new OS kernel for wireless sensor networks (WSNs) supporting • OS protection • Virtual memory • Preemptive priority scheduling Without assuming traditional hardware support Application complexity OS t-kernel Hardware abstraction
OS (t-kernel) t-kernel – Approach • Load-time code modification • The naturalized program becomes a cooperative program that supports OS protection, virtual memory, and preemptive scheduling Naturalized program Application
t-kernel – Naturalization Process Program Memory (128KB) • Naturalizer • Processesbinary instructions, and generates naturalized instructions (called natin) • Page by page • Paging • Storage management • Dispatcher • Controls execution Paging Naturalizer Dispatcher RAM (4KB) μC Nonvolatile Storage (flash, 512K) App
Source code tree Source code will be available at http://www.sf.net/projects/vert Dispatcher Naturalizer Paging module
Outline • Overview • Code modification • Dispatcher and transitions • Re-visit the code modification • Differentiated virtual memory
cli; disable interrupts self: rjmp self; Code modification – CPU control • Traditional solution: clock interrupts • Masking interrupts is privileged • Adopted by some WSN OS’s • Challenge: NO privilege support • Application can turn off clock interrupts Sensor node halts!
Code modification – Branch regulating • Modified branches to transition logic • Guarantees that the kernel and the application alternatively take hold of the CPU • A-K transition – application to kernel • K-A transition – kernel to application • Performance is an important factor in the design ALU Inst. ALU Inst. Transition logic Branch Inst. ALU Inst. ALU Inst. App page Natin page
Code modification – Branch regulating • VPC – Virtual program counter • Generated by a compiler or a programmer • HPC – Host program counter • The physical program counter on the hardware platform HPC Transition logic Branch Inst. VPC ALU Inst. ALU Inst. Transition logic Branch Inst. VPC ALU Inst. ALU Inst. App page Natin page
Code modification – Branch regulating • “cli” is copied • We allow the application to turn off interrupts • “rjmp” is modifed to an “rcall” • A branch helper stub (branch_stub) helps transfer the control flow to the kernel cli self: rjmp self cli rcall branch_stub self (VPC) branch_stub: push r31 in r31, 0x3f cli call townGate Kernel K-A transition
Code modification – Branch regulating • The dispatcher retrieves the destination VPC from the natin page • HPC = lookup(VPC) cli rcall branch_stub self (VPC) Stack frame Stack frame branch_stub: push r31 in r31, 0x3f cli call townGate ret1 dispatcher r31 ret0
t-kernel t-kernel – lookup(VPC)? • Problem: How to look up entry points? • Estimated index table size: 9KB • Entry table in RAM? Exceeds 4K RAM size. • In flash? Slow. • Solution: In-Page Indexing Natin page 102 Transition logic VPC HPC Entry point ALU Inst. Transition logic Natin page 107
t-kernel – In-Page Indexing • Embed indexing in naturalized code • Part of K-A transition performs VPC look-up logic
t-kernel – Required RAM 3 bits • Minimum: 0 Bytes in RAM • To enhance speed, index cache • Hash has 16 possible results 8 bits 6 bits Tag Hash Index VPC Offset VPC Hash
t-kernel – Three-level lookup • 1. VPC look-aside buffer (fast) • 2. Two-associative VPC table • 3. Brute-force search on the natin pages (slow)
Code modification – Branch regulating • VPC lookup HPC • The dispatcher controls the execution of the application • Starts at VPC = 0 (or any start address) • Performs lookup(VPC) and sets the HPC HPC Transition logic Branch Inst. VPC ALU Inst. ALU Inst. dispatcher Transition logic Branch Inst. VPC ALU Inst. ALU Inst. App page Natin page
Outline • Overview • Code modification and transitions • Bridging • Re-visit the code modification • Differentiated virtual memory
Bridging • Bridging: To accelerate execution speed, shortcut the branch source and destination • Differentiate forward and backward branches • Forward: Dest VPC > Source VPC • Backward: Dest VPC <= Source VPC rjmp Next ... Next: add r0, r1 rcall branch_stub Next (VPC) ... ... add r0, r1 ... branch_stub: push r31 in r31, 0x3f cli call townGate dispatcher
Bridging – Forward branch • The dispatcher looks up HPC for “Next” • The naturalizer patches the natin page • Handled by “townLogic” (town transition) naturalizer rcall branch_stub Next (VPC) ... ... add r0, r1 ... jmp HPC ... ... add r0, r1 ... Stack frame Stack frame HPC ret1 branch_stub: push r31 in r31, 0x3f cli call townGate branch_stub: push r31 in r31, 0x3f cli call townGate r31 ret0 dispatcher
Bridging – Backward branch • The naturalizer patches the natin page • SystemCounter is 8-bit • One trap into kernel per 256 backward branches • “headmaster” does sanity check in the dispatcher ... add r0, r1 ... push r31 in r31, 0x3f push r31 lds SystemCounter inc SystemCounter sts SystemCounter brne go cli call headmaster go: pop r31 Out 0x3f, r31 pop r31 jmp HPC ... naturalizer ... Prev: add r0, r1 ... rcall branch_stub Prev (VPC) ... ... HPC dispatcher
Bridging and CPU control • Accelerates execution speed for both forward and backward branches • Still guarantee CPU control by the OS • No infinite loops Transition logic ALU Inst. dispatcher Natin page 102 Transition logic ALU Inst. Natin page 107
Outline • Overview • Code modification and transitions • Bridging • Re-visit the code modification • Differentiated virtual memory
Jump instructions • Example: rjmp • Modified by: • translateRjmp() in naturalizer.c • Patched by: • townLogic() in dispatcher.c • rewritePgmPage() in naturalizer.c rjmp DEST rcall branch_stub DEST (VPC) branch_stub: push r31 in r31, 0x3f cli call townGate
Conventional conditional branches • Example: breq • Modified by: • translateBranch() in naturalizer.c • Patched by: • townLogic() in dispatcher.c • rewritePgmPage() in naturalizer.c breq DEST fall_thru: ... breq taken rcall branch_stub fall_thru (VPC) taken: rcall branch_stub DEST (VPC) branch_stub: push r31 in r31, 0x3f cli call townGate
Skip instructions • Example: sbrs – Skip the next instruction if the bit in the register is set • Modified by: translateSkip() in naturalizer.c • Patched by: townLogic() in dispatcher.c, rewritePgmPage() in naturalizer.c sbrs r18, 1 add r0, r1 sbrs r18, 1 add r0, r1 Back:sbrs r18, 1 noskip: <insn> skipped: sbrs r18, 1 rjmp _noskip _skipped: rcall branch_stub skipped (VPC) _noskip: natins for <insn>
Page boundary • Add a town transition to the next VPC when a natin page is full • VPC_Miss: a kernel service that handles the situation that a VPC is not services in this natin page ... ... Natin(s) for <insn1> rjmp nextpage jmp VPC_Miss nextpage: rcall branch_stub insn2 (VPC) ... ... insn1: <insn1> insn2: <insn2>
Link-in and finis • Bridging makes direct jumps between natin pages • The incoming pages are recorded in the link-in record of the natin page • Needed for invalidating direct jumps when a natin page is changed (new entry point inserted, or swapped out) ... ... last town transition incoming page 0 incoming page 1 ... incoming page 2 branch_stub ... page 109
Link-in and finis • Version No: used by the bridging • Code length: number of VPCs in this natin page • Start VPC: the first VPC in this natin page _VPC100: rcall branch_stub VPC205 ... branch_stub: ... Finis: version = 6. code length = 23 start VPC = VPC100 _VPC200: _VPC205: ... branch_stub: ... Finis: version = 7. code length = 28 start VPC = VPC200
Outline • Overview • Code modification and transitions • Bridging • Re-visit the code modification • Differentiated virtual memory
t-kernel – Differentiated Memory Access • Physical address sensitive memory (PASM) • Virtual/physical addresses are the same • The fastest access • Stack memory • Virtual/physical addresses directly mapped • Fast access with boundary checks • Heap memory • May involve a transition to kernel • The slowest, sometimes involves swapping
PASM • Example: lds/sts at PASM – Load/store physical address sensitive memory • Not modified, runs at native speed lds r18, 0x20 sts 0x20, r18 lds r18, 0x20 sts 0x20, r18
Stack memory area • Example: push register, std pointer+d, register • Pointer: a register pair pointing to an address, e.g., Y = r29:r28 • Modified by: translateLdd(), translateStd, translateLd, translateSt, translateLduu, ranslatestuu in naturalizer.c push r18 push r18 std Y+2, r18 adiw r28, 2 cpi r29, 0x10 brcs instack ... instack: std Y, r18 sbiw r28, 2
Heap memory area • Example: st pointer, register • Modified by: translateLdd(), translateStd, translateLd, translateSt, translateLduu, ranslatestuu in naturalizer.c • Kernel services involved • scall_st, scall_ld cpi r29, 0x10 brcs instack push r31 push r29 push r30 push r28 in r30, 0x3f push r30 movw r30, r28 mov r29, r18 call scall_st pop r30 out 0x3f r30 pop r28 pop r30 pop r29 pop r31 rjmp inheap instack: st Y, r18 inheap: ... st Y, r18
t-kernel – Evaluation Virtual memory overhead (old version) Evaluated on a 7.3827MHz Mica2 Mote
t-kernel – Swapping • Traditional virtual memory: hard disks can be written infinite times (theoretically) • Flash on MICA2: 10,000 erasure/write cycles • WSNs often have write-unfriendly external storage • Bad swapping may destroy flash in one day (1,000 swaps/hour)
Virtual memory space Flash t-kernel – Swapping • Problem: How to swap with write-unfriendly storage and small RAM? • Direct mapping • Minimizes in RAM data structure • Could destroy a flash in less than 1 day * (assume 1,000 swaps/hour) • Page table with external address • Maximizes longevity • Wear leveling • Needs 352B at minimum (8.5% of RAM) • Solution: Partitioned swapping Page # External address Flash
t-kernel – Partitioned Swapping • Super page partition: fast swaps • Overflow partition extends longevity 32 Bytes in RAM, 266 days (1,000 swaps/hour), 20% fast swaps
t-kernel – Partitioned Swapping • It is not that one partitioning fits all • Application-directed partitioning • Associativity parameter
t-kernel – Partitioned Swapping • Balance between swapping speed and longevity • An example: Associativity = 2 • 266 days, 20% fast swaps (assuming 1000swaps/hr)
t-kernel – Partitioned Swapping • Related kernel code • scall_ld, scall_st in dispatcher.c • swapRam() in paging.c • Flash operations (in paging.c) • writeExtFlashChunk() • readExtFlashChunk()
Resources … to be continued • L. Gu and J. Stankovic, t-kernel: Providing Reliable OS Support for Wireless Sensor Networks, SenSys 2006 • L. Gu and J. A. Stankovic. t-kernel: a Translative OS Kernel for Sensor Networks, UVA CS Tech Report CS-2005-09, 2005 • Web page under construction: http://www.cs.virginia.edu/~lg6e/tkernel/ • To be available: source code in the vert repository: www.sf.net/projects/vert • Comment and bug report: lingu@cs.virginia.edu