490 likes | 751 Views
Practical Embedded System Issues. Edwin Olson MIT CSAIL eolson@mit.edu May 8, 2008. Today’s Goal. Give you a feeling of how an embedded system works How is embedded development different from general-purpose development? Show you some implementation nuts and bolts
E N D
Practical Embedded System Issues Edwin Olson MIT CSAIL eolson@mit.edu May 8, 2008
Today’s Goal • Give you a feeling of how an embedded system works • How is embedded development different from general-purpose development? • Show you some implementation nuts and bolts • Help you avoid common blunders when implementing embedded systems in the future
Agenda • What is an embedded system? • Introduction to Embedded Development • Implementing a pre-emptive kernel • Processor/Memory model • Context switching and scheduling • Implementing a device driver • Many ways to do it… most wrong! • A simple embedded system • DARPA Urban Challenge ADU
Embedded Systems Development • What is an embedded system? • A system designed for a specific (fixed) function • Hardware and software design are often coupled • Resources (memory, CPU) often limited • Real-time issues more common • Ludicrous cost sensitivity • Smaller teams: • Developers are a lot “closer” to the hardware
Embedded System Development • Examples: • Mars rovers • Industrial automation • Automotive ECU • OrcBoard • ADU
ARM (Advanced RISC Machine) • ARM is big in embedded world • Low power • Range of products • Applications • iPhone • GameBoy Advanced • Automotive • 90 processors shipped per second
Microcontrollers/SoC • System-on-chip • CPU + RAM + FLASH + I/O devices integrated into one package • Advantages • Lower system cost • Faster development • Greater reliability • Physically smaller • Disadvantages • Less customizable • Less expandability CPU 50MHz 32 bit Luminary ARMv7m RAM64 KB Peripherals ADC, PWM, QEI, UART, I2C, SPI, CAN, EMAC FLASH 256KB
ARM v7m Memory Map 0x00000000 vectors code • Single 32 bit address space • Code, Data, IO Space • No MMU (Memory Management Unit) • No virtual memory • All processes exist in same space • Some memory protection (woo!) • Guard pages • No-execute regions FLASH data init 0x20000000 data bss kernel stack RAM task 0 stack task 1 stack …Heap… 0x40000000 uart ethernet IO Space timer …
Power On • On a general purpose system (Linux/Solaris) many things happen for you automatically • In embedded world, you’re responsible! • What happens when we turn the system on? • How does our code start?
“Booting” vectors 0x00000000 Reset address, IRQ handler addresses code int myGlobalVariable = 50; int myOtherGlobal; void crt() { // initialize hardware, e.g., PLL // copy .data section for (uint32_t *src = _etext, *dst = _data ; dst != _edata ; src++, dst++) (*dst) = (*src); // zero .bss section for (uint32_t *p = __bss_start__; p != __bss_end__; p++) (*p) = 0; main(); } void main() { … } FLASH data init data bss RAM …Heap… uart ethernet IO Space timer …
Global Variables vectors code int myGlobalVariable = 50; int myOtherGlobal; void crt() { // initialize hardware, e.g., PLL // copy .data section for (uint32_t *src = _etext, *dst = _data ; dst != _edata ; src++, dst++) (*dst) = (*src); // zero .bss section for (uint32_t *p = __bss_start__; p != __bss_end__; p++) (*p) = 0; main(); } void main() { … } FLASH data init data bss RAM …Heap… uart ethernet IO Space timer …
Global Variables vectors code int myGlobalVariable = 50; int myOtherGlobal; void crt() { // initialize hardware, e.g., PLL // copy .data section for (uint32_t *src = _etext, *dst = _data ; dst != _edata ; src++, dst++) (*dst) = (*src); // zero .bss section for (uint32_t *p = __bss_start__; p != __bss_end__; p++) (*p) = 0; main(); } void main() { … } FLASH data init copy data zero bss RAM …Heap… uart ethernet IO Space timer …
Using a real-time kernel • Often have several things to do at once: • Operate peripherals • Command interface • Watchdogs • Cooperative versus pre-emptive multitasking
Initializing Application vectors code void main() { nkern_init(); nkern_task_create(task0, PRIORITY_HIGH, 1024); nkern_task_create(task1, PRIORITY_LOW, 1024); nkern_bootstrap(); // never returns } void task0() { … } void task1() { … } FLASH data init malloc data bss kernel stack RAM …Heap… uart ethernet IO Space timer …
Initializing Application vectors code void main() { nkern_init(); nkern_task_create(task0, PRIORITY_HIGH, 1024); nkern_task_create(task1, PRIORITY_LOW, 1024); nkern_bootstrap(); // never returns } void task0() { … } void task1() { … } FLASH data init malloc data bss kernel stack RAM task 0 stack …Heap… uart ethernet IO Space timer …
Initializing Application vectors code void main() { nkern_init(); nkern_task_create(task0, PRIORITY_HIGH, 1024); nkern_task_create(task1, PRIORITY_LOW, 1024); nkern_bootstrap(); // never returns } void task0() { … } void task1() { … } FLASH data init malloc data bss kernel stack RAM task 0 stack task 1 stack …Heap… uart ethernet IO Space timer …
Context Switching • At any point in time, either: • A user task is running: nkern_running_task • An IRQ or kernel is running • How do we switch from one task to another? • What state does a task have? • CPU register state • Task stack state • Virtual memory mapping state (no MMU)
Context Switching • Our strategy: • We’ll store registers on task’s stack • Just need to remember each task’s stack pointer • Each task represented by an nkern_task_t typedefstruct { uint32_t sp; // stack pointer for task uint32_t priority; // task priority nkern_wait_list_t waitlist; // discuss these later… uint64_t wait_utime; } nkern_task_t; nkern_task_t *nkern_running_task; // pointer to currently-running task
Context switching cartoon • SysTick timer generates periodic interrupts • Configured by writing to I/O memory space IRQs time IRQ handler Task 0 Task 1 How often should SysTick timer generate interrupts?
Before the IRQ • Used stack • local variables • function calls • Suppose task0 is executing • …Doing something useful… • …Using its stack and registers… • Stack pointer (SP) points to next available memory location SP Task 0 Stack Unused stack Task 1 Stack RAM
An IRQ Occurs… • Used stack • local variables • function calls • Hardware pushes some of task0’s registers onto the stack: • PC, xPSR • r0, r1, r2, r3, r14 (LR) • Hardware then invokes the IRQ handler… SP Hardware-saved registers Task 0 Stack SP Unused stack Task 1 Stack RAM
A Useless IRQ Handler • Used stack • local variables • function calls systick_irq_handler: stmdb sp!, {r4-r11} // push remaining registers ldr r0, =nkern_running_task ldr r0, [r0] str sp, [r0] // save SP in nkern_task_t … …. ldr r0, =nkern_running_task // load SP from nkern_task_t ldr sp, [r0] ldr sp, [r12] ldmia sp!, {r4-r11} // pop registers rti // return from interrupt // hardware will take over SP Hardware-saved registers Task 0 Stack SP Software-saved registers Unused stack Task state is now saved! Task 1 Stack Task is now restored! Why does the hardware save only some of the registers? RAM
SysTick IRQ Handler • Used stack • local variables • function calls systick_irq_handler: stmdb sp!, {r4-r11} // push remaining registers ldr r0, =nkern_running_task ldr r0, [r0] str sp, [r0] // save SP in nkern_task_t … call nkern_scheduler …. ldr r0, =nkern_running_task // load SP from nkern_task_t ldr sp, [r0] ldr sp, [r12] ldmia sp!, {r4-r11} // pop registers rti // return from interrupt // hardware will take over SP Hardware-saved registers Task 0 Stack SP Software-saved registers Unused stack Pick a different task to run • Used stack • local variables • function calls Hardware-saved registers Task 1 Stack Software-saved registers Unused stack RAM
Scheduler: preemptive round-robin • Each priority level maintains a queue of runnable tasks. • Algorithm: • Put the old task at the end of its priority queue or into the sleep queue • Examine queue of sleeping tasks • Wake those whose sleep interval has expired • Find the highest-priority, non-empty queue • Remove and return first item in queue.
What else do we need? • If our tasks are always runnable, we’re done! • If a task wants to wait for a fixed time: • nkern_running_taskwait_time = now() + delay • manually cause a SysTick IRQ • What if a task wants to wait for some asynchronous event? • Receive data from ethernet/serial • A button press
An example problem • Serial-To-LCD • Receive characters from serial port, write them to a graphical LCD display
Serial-to-LCD: dumb • Continuously check for occurrence of event • Miserable! • High CPU usage • Higher latency for other tasks • Higher power consumption • … and really common • (thanks to reusing vendor-supplied sample code) void serial_echo_task() { while (1) { while (!serial_rx_data_available()); // spin wait char c = serial_rx_get_data(); lcd_draw_character(c); } }
Serial-to-LCD: half a good idea • Suppose the hardware can generate an IRQ when our desired event occurs • What’s wrong with this? • Long-running IRQ • lcd_draw_character() called from IRQ context • Destroys real-time performance of system! void serial_irq() { if (serial_rx_data_available()) { char c = serial_rx_get_data(); lcd_draw_character(c); } }
Serial-to-LCD: smart • Add task to a waitlist • Scheduler will stop scheduling that task • An IRQ will “wake up” the task nkern_wait_list_t *serial_rx_wait_list; void serial_echo_task() { while (1) { nkern_wait(serial_rx_wait_list); // won’t return until data avail char c = serial_rx_get_data(); lcd_draw_character(c); } } void serial_irq() { if (serial_rx_data_available()) nkern_wake_all(serial_rx_wait_list); }
Serial-to-LCD: smarter! • Trigger a reschedule from within the IRQ • Don’t have to wait for SysTick before the task can wake up. • Greatly reduces servicing latency void serial_irq() { if (serial_rx_data_available()) { nkern_wake_all(serial_rx_wait_list); nkern_schedule(); } } One more tiny change needed for “smartest”… hint: prevent unnecessary calls to nkern_schedule()
Serial-to-LCD: Moral • All four methods are “correct” • Implement Serial-to-LCD’s requirements • Are interchangable • You might find any one of these in a system! • But only the fourth method is good! • The first will consume tons of CPU and power • The second can violate real-time requirements of other tasks in the system • The third will have high latency • We can still shoot ourselves in the foot, even if we are using the right tools (e.g., real-time OS)
Serial IRQ Handler: Uncensored static void serial0_irq_real(void) __attribute__ ((noinline)); static void serial0_irq_real() { uint32_t status = UART0_FR_R; uint32_t reschedule = 0; if (!(status & ((1<<RX_IRQ) | (1<<RX_TIMEOUT_IRQ)))) { // data is available. UART0_ICR_R = (1<<RX_IRQ) | (1<<RX_TIMEOUT_IRQ); reschedule |= _nkern_wake_all(&rx_waitlist[0]); } if (!(status & (1<<TX_IRQ))) { // room to send UART0_ICR_R = 1<<TX_IRQ; reschedule |= _nkern_wake_all(&tx_waitlist[0]); } if (reschedule) _nkern_schedule(); } static void serial0_irq(void) __attribute__ ((naked)); static void serial0_irq(void) { IRQ_TASK_SAVE; NKERN_IRQ_ENTER; serial0_irq_real(); NKERN_IRQ_EXIT; IRQ_TASK_RESTORE; } #define IRQ_TASK_SAVE \ asm volatile ("mrs r12, PSP \r\n\t" \ "stmdb r12!, {r4-r11} \r\n\t" \ "ldr r0, =nkern_running_task \r\n\t" \ "ldr r0, [r0] \r\n\t" \ "str r12, [r0] \r\n\t"); #define IRQ_TASK_RESTORE \ asm volatile ("ldr r0, =nkern_running_task \r\n\t" \ "ldr r12, [r0] \r\n\t" \ "ldr r12, [r12] \r\n\t" \ "ldmia r12!, {r4-r11} \r\n\t" \ "msrpsp, r12 \r\n\t" \ "ldr pc, =0xfffffffd \r\n\t" \ ".ltorg \r\n\t"); #define NKERN_IRQ_ENTER nkern_in_interrupt_flag++; #define NKERN_IRQ_EXIT nkern_in_interrupt_flag--; What’s up with the __attribute__ stuff?
Top-Half/Bottom-Half Handlers • Consider Ethernet peripheral • When a packet arrives, a lot of processing can result • Checksums to verify • IP fragments to reassemble • TCP windows to update • Applications to wake up • Want to keep IRQ Handler as fast as possible: • Modern strategy • IRQ Handler is minimalist. Does least amount of work possible, wakes up another thread to finish the work. • Thread processes incoming data while respecting priorities of other tasks in system.
System Data Flow Sensors ADU Raw sensor data Manual Override, Run, Stop, E-Stop Steering, Gas, Brake, Shifter Motion Plan Sensor Processing, Path Planning 40 CPU Blade-Cluster Non real-time Linux
Sensor Processing (Blades) • Obstacle Detection, tracking • Using LIDAR, RADAR
Sensor Processing (Blades) • Detect road paint • Using Cameras
Sensor Processing (Blades) • Estimate Lanes
Motion Planning (Blades) Goal Point Curb seen by “hazards”, but not yet lane tracker Car Gray = BadRed = Infeasible Lamp Post
Motion Planning (Blades) • Search for a series of steer/gas commands that get us closer to goal Obstacle Oncoming lane Off-road Current Position 41
Waveform Generation • Blade cluster (not real time) generates waveform plan in advance • ADU executes plan, generating waveforms in real-time gas/brake steering wheel position time plan received next plan deadline now
ADU • Interface between the vehicle and our primary computers • Need physical interface to real world • DACs, UARTs, CAN • Basic mode switching of car (even if cluster is off ) • Detect failures/bugs in our main software • Maintains a big finite state machine • “RUN”, “PAUSE”, “STANDBY” • Transitions caused by: • Commands from blade cluster • Human Button presses • Time-outs
ADU Finite State Machine Shift Command Standby 5 seconds Run Shift Shift Done Invalid command Watchdog timeout “Run” Button Stop “Stop” Button Manual Override released Manual Override engaged ManualOverride Assertion Failure Error
ADU Tasks • tick_task() • Trigger periodic FSM state transitions • Watchdog timer • emc_poll_task • Query car’s drive-by-wire for status periodically • emc_async_task • Execute queued commands to drive-by-wire system (shifting, turn signals) • dac_task() • Generate drive/steer analog output signals • shift_task() • Perform and monitor transmission shifting • udp_command_task() • Handle incoming ethernet command packets • button_task() • Sample button inputs • Debounce • Trigger state transitions • music_task() • Play music/make sounds to report state changes • status_task() • Sends periodic status messages via UDP • lcd_task() • Display status on LCD display • debug_task() • Dump kernel statistics over serial port on command FSM /Watchdog Command Inputs Vehicle control interface Diagnostic/Monitoring
Communications • How do we interface a non real-time system to a real-time system? • TCP? • Retransmissions, windowing heuristics hard-to-predict latency • UDP? • Nominally unreliable • Others: • CAN, RS-232, RS-485, USB…
Communications: UDP • Error rates over local LAN? • Gigabit ethernet bit error rate = 10-12 • Dominant loss mode: host buffer overflows • Idempotent commands are robust to packet loss • Don’t send: “Turn steering wheel clockwise”, send “Set target steering wheel position to 0.92”. • Our strategy: • Retransmit idempotent commands at a rate high enough where packet loss failure mode is negligible
But why Ethernet? • Very high bit rate • Low error rate • Low latency • Multi-point • Cheap • Noise Immunity • Electrical Isolation
Conclusions • Embedded systems are neat! • Tightly coupled hardware+software • Possible (necessary?) to understand the whole system • Interesting & unique challenges: • Limited resources: CPU, memory, power, size, cost • Simple real-time schedulers • Real time savers • Not a panacea: success requires knowledge and care • ADU as a simple embedded system