520 likes | 837 Views
dr. Stefan Dulman Embedded Software Group. Embedded Software. TI2720-C. 6 . Debugging techniques. Grace Hopper 1947. Overview. Debugging techniques Debugging a distributed system Lab information Preparation for the exam. Debugging.
E N D
dr. Stefan Dulman Embedded Software Group Embedded Software TI2720-C 6. Debugging techniques
Overview • Debugging techniques • Debugging a distributed system • Lab information • Preparation for the exam
Debugging • Write good code! Testing only uncovers a fraction of bugs • Testing and debugging can be very difficult • HW is involved (radio communication over devices?) • Different perception on bugs for ES • MS IE crashed? • Elevator not working? Telephone calling the wrong person? Cash machine? Medical instruments?
Testing on the host machine • Try finding bugs early in the development process • HW might not be available early on • Exercise all of the code • Unlikely situations difficult to test (events on two devices) • Develop reusable, repeatable tests • Leave an “audit trail” of the test results • ES do not have usually hard-drives • Conclusion: do not test on target more than needed!
Testing on the host machine Target system Test system Hardware independent code Hardware independent code Hardware dependent code Test scaffold code Hardware Display Disk Keyboard
Testing on the host machine • Divide code into two categories • Hardware-dependent code • Hardware-independent code • Scaffold code provides the same entry points as the HW-independent code • HW-dependent code can be debugged on HW only • Replace calls to HW functions with meaningful information • Rather than replacing the function in() for radio, replace the layer above with a printf
Testing on the host machine • Calling interrupt routines • Interrupt routines are major part of the system • Calling interrupt routines needs to be done from scaffold • Difficult? Not really • Split code of the ISR in two: HW-dependent and HW-independent • Place the HW-independent code in a separate function • Test this code extensively • Example: character processing on the serial line
Calling the timer interrupt • Timer ISR – one of the most used and important parts of the system • Alternatives: • Emulate HW to “naturally” call this ISR • Force calls from the test scaffolds • Second option is preferred • The overhead is not large on a PC system • You have control in deciding when the time ISR is called with respect to other interrupts • Have several event occur “at the same time”
Script files • Automation is needed almost always • Validate a piece of code under the same input conditions • “Trigger” ISRs in the same manner each time • Using script files requires some small overhead • Parser needed • Very simple instructions (2-3 characters) • Comments must be allowed • Data has to be entered ASCII & Hexadecimal • Results reporting in a formatted form
Script files - interleaving Input script Output file # Frame arrives (beacon with no element) # DstSrc Ctrl TypStn Timestamp mr/56 ab 0123456789ab 30 00 6a6a # backoff time expires (software should send frame) kt0 --> Sending frame: ab ff ab ... # timeout expires again kt0 #Some time passes kn2 kn2 --> Sending frame ab ff ab ... # another beacon arrives # DstSrc Ctrl TypStn Timestamp mr/56 ab 0123456789ab 30 00 6a6a # Frame arrives (beacon with no element) # DstSrc Ctrl TypStn Timestamp mr/56 ab 0123456789ab 30 00 6a6a # backoff time expires (software should send frame) kt0 # timeout expires again kt0 #Some time passes kn2 kn2 # another beacon arrives # DstSrc Ctrl TypStn Timestamp mr/56 ab 0123456789ab 30 00 6a6a
More advanced scripting • Automate state machine behavior • Each time some actuation takes place, a certain input (ISR) should be triggered • This behavior is an alternative to the regular behavior • Emulate the communication medium between devices • Wireless communication – interference, multiple devices • barcode reader, any wireless interface… Scanner A Scanner B Register A Register B Test scaffold software = communication medium
Objections!!! • Objections to debugging on host machine appear • Engineer fails to quantify correctly the needed effort • Boss wants code to be ready fast • Everyone fails to realize how much time debugging will take • Most common objection • Testing on host is useless: most of code is HW-dependent
Objections!!! • Building a test scaffold is difficult misconception • Writing a simple parser is easy even if done in C • Code output capture is a simple formatting function • Debugging takes significantly longer than code writing • RTOS needs to run on the host system • If so, choose one RTOS that does (many versions available) • If not, emulate system calls! Use a shell around the OS
Objections!!! • Other objections • Software interaction with hardware • Example: use the wrong address to access the UART • Response and throughput • Shared-data problems (assembly level) • Portability issues (big-endian vs. little-endian) • They are true! • You cannot (usually) test these on the host • Testing on the host is not aimed at these issues
Instruction set simulators • The processor execution is emulated in software • Software executing assembly version of real code • Useful for testing some of the issues not covered by debugging on the host • Advantages • Determine response time and throughput • Testing assembly-language routines • Resolving portability issues • Testing code dealing with peripherals • What they cannot do • Shared-data bugs – unless scripting is allowed or you are lucky • Simulate the whole platform
The ASSERT macro • Very used tool: assert(condition); • If condition is true, noting happens • If condition is false, user code is launched • Print a message and stop • Advantages • Program state is tested on the spot: localized bugs • Example usage: • Test input values of a function (passing NULL pointers?) • Test context of function calls • assert(!” bad return from function”)
The ASSERT macro • Assert on the target platform • Less verbose as systems do not usually have displays • Redefine functionality of assert • Disable interrupts and do: “while(1){};” • Turn on LEDs in a given pattern (led-based debugging) • Write variables to a certain location of memory • Write the location of the current instruction • Execute illegal instruction to trigger debugger or stop
Using lab tools • Multimeter • Measure voltage between various points • Use it to detect if circuits are powered • Test state of enable/disable pins • Measure resistance between points • Make sure device is off – detect short or open circuits • Oscilloscope • Software engineers usage (time, voltage, trigger) • Use it as a voltmeter • Check if circuit is working at all by looking at changing waves • Check if signals are changing as expected (communication lines) • Check timing issues
Logic analyzer • A “must-have” tool for embedded software design • Usb-based cheap versions available • Can track a set of digital signals simultaneously • Only VCC and ground signals understood • Logic analyzers are storage devices • Significantly more complex triggering mechanisms • Timing mode (make use of “real” time) • Test if event ever occur • Measure time with respect to code length and response • Control output signal patterns
Logic analyzer • State mode (makes use of external time signals) • Use the RE signal on memories to monitor memory access • Trace of execution display: signal, hex value, assembly code, line nr. • Trigger on a special event (memory access) and go backwards in time • Trigger writing of bad values in RAM (NULL pointers as parameters) • Filters on the data values are available
Using lab tools • In circuit emulators • Hardware devices emulating the whole processor • Allow breakpoints and memory/registry inspection • Desktop debugger + logic analyzer in one tool • Logic analyzers are preferred: any processor, timing mode, better filters, easy to install • Monitors • Two parts: on the target system and on PC • Can set up breakpoints, makes use of serial communication • JTAG port (Joint Test Action Group)
Overview • Debugging techniques • Debugging a distributed system • Lab information • Preparation for the exam
Distributed systems • Example: wireless sensor network deployment • Setup: • Large scale network (tens, hundreds of devices) • Radio communication • Difficult to access, outside deployments • Distributed algorithms Jan Beutel: Deployment techniques for Sensor Networks
Example: Great Duck Island • Details • Summer 2002, Great Duck Island, gulf of Maine, USA • 5000 burrows of Leach’s Storm Petrels • Hardware • 43 nodes deployed for four months • Sensors: light, temperature, humidity, pressure, infrared • Solar powered gateway connected to a basestation (satellite connection) • 123 days of experiment -> 1.1M samples (6.6M expected)
GDI experiment • Failures: • Hardware: water entered casings • Sensors shared A/D converter • One malfunctioning sensor corrupted all readings on nodes • Transparent casing: high temperature inside • Clocks & oscillators not working as expected • Second deployment • Multihop network • Density too large: batteries drained • Basestation failure due to harsh weather
LOFAR-Agro • Summer 2005, Holland, precision agriculture • “Murphy loves potatoes” • Wrong commit to SVN • Nodes were flashed with buggy code • Software update delivered (and stored in external memory) • Bug lead to continuous update of the network • Batteries dead after 4 days • Routing and MAC layers used different buffer sizes • Watchdog protection -> triggered on all nodes within 2-6h • Data collection – 2% • Gateway issues: power outage in the morning (solar) • Nodes stored data also in FLASH -> bug
Deployment problems • Node problems • Low battery corrupt readings, software resets • Counter overflows, WDT, incorrect downloads, sinks • Link problems and paths problems • Message loss, network congestions, asymmetric links • Networking protocols, routing loops • Global problems • Partitioned networks, emergent behavior (resulting in latency)
Understanding the system • Debugging does not involve only software! • Four components • Hardware • Software • Communication • Environment
Node instrumentation • Software instrumentation • Extract the (partial) state of the system • Source vs. Binary • Probe effects, trampoline • Operating system vs. Application • Dynamic instrumentation • Aspect oriented programming • User specifies patterns • Hardware instrumentation
Analyzing the system • Monitoring and visualization • Inferring network state from node state • Failure detection • Root cause analysis • Node-level debugging • Replay and checkpointing
Overview • Debugging techniques • Debugging a distributed system • Lab information • Preparation for the exam
Lab information • Lab assignment (groups+schedule) -> Blackboard • Passing lab = passing three assignments -> Blackboard • Information for the lab • X32 home page: http://x32.ewi.tudelft.nl/ • Read: "X32 Programmer's Manual“, "Running uC/OS on the X32“, "X32 Source-level Debugger“, "X32 Design (MSc Thesis)“ • Come prepared to the lab!!! • Read information and write the code at home • Without written code you will NOT be allowed in the lab • Contact student assistants by email (Blackboard!) for questions
Overview • Debugging techniques • Debugging a distributed system • Lab information • Preparation for the exam
Exam • Details • Material: Book + 2 articles + lecture notes (see website) • Monday, week 2.10, 14.00-17.00 (double check official schedule) • Multiple choice questions • English language • Questions/clarifications if needed – let me know! • DO register for the exam • You need to pass the lab for entering the exam • You need to pass the exam to get a marks • Final mark: 75% written exam; 25% practicum mark
Question Which of the following statements is NOT correct? An Embedded System … • a) has always human safety requirements • b) may have interactions with the physical world • c) may control controlling a technical entity • d) is embedded in the physical world
Question Which factor determines the least delay until the execution of an interrupt? The shortest period of time … • a) during which the interrupt is enabled • b) it takes to execute any higher priority ISR • c) during which the context of the current task is restored • d) during which another higher priority task is executing
Question Which of the following statements is correct? A reentrant function … • a) may be called by different tasks • b) may not be called by different tasks • c) must use hardware in a non‐atomic way • d) may call any other function
Question Which requirements determine the choice of architecture (RR, RRI, FQS or RTOS) for an embedded system? • a) Reliability requirements • b) Safety requirements • c) Scalability requirements • d) User interface requirements
Question • Which of the following statements is correct? The shared data problem can be solved through … • a) volatile sections • b) non‐reentrant sections • c) atomic sections • d) critical sections
Question • Which of the following statements is correct? The X32 platform … • a) is not equipped with an IR controller • b) allows IR nesting • c) prohibits IR priorities • d) prohibits IR preemption
Question Which of the following statements is correct? An interrupt can be disabled in order to … • a) disable a critical section • b) protect a critical section • c) protect other interrupts • d) enable context switches
Question Which of the following statements is correct? • a) We cannot have a shared data problem in an RR architecture • b) We cannot have a shared data problem in an RRI architecture • c) We cannot have interrupt vectors in an FQS architecture • d) We cannot have atomic sections in an RTS architecture
Question Which of the following statements is correct? Using interrupts impairs … • a) task response time • b) higher priority ISR response time • c) processor response time • d) none of the above
Question Why should time‐slicing NOT be used in an RTOS? • a) Equally important tasks require equal processor attention • b) It creates too many data sharing problems • c) Alternatively, all tasks could become one task • d) It should be used; time slicing is essential in an RTOS
Question Which of the following statements is correct? An interrupt service routine is supposed to ... • a) disable the non‐maskable interrupt • b) restore the lowest‐priority interrupt • c) restore the context and return • d) increase the program counter