210 likes | 405 Views
Tizen Architectural Specification: Crash Reporting. TIZEN ADS 0000 Ver. 0.4 2013-11-18 Leonid Moiseichuk. Introduction Legacy solutions Architecture Detailed Architecture Appendix. Introduction.
E N D
Tizen Architectural Specification: Crash Reporting TIZEN ADS 0000 Ver. 0.4 2013-11-18 Leonid Moiseichuk
Introduction • Legacy solutions • Architecture • Detailed Architecture • Appendix
Introduction • Crash reporting for embedded system has number of differences in comparison to desktops/servers by limitations and amount of devices. • For example, we cannot have installed debugging symbols because they consume several hundreds megabytes of space. Thus we cannot get backtrace on device. • On the other hand, the centralized secure crash information storage opens new opportunities to make a cross-component analysis to identify most important issues to be fixed first, often across several products if server part will support it. • This presentation will show how we can improve existing solution by extending towards server-based approach.
Introduction | Feature Overview • Easy crash collection on release images – just install Crash Reporter packages. They might be installed always in all images as well but in disabled state. • Kernel and application crashes/oopses will be collected as well any kind of device runtime information, the crash reasons coverage will be closer to 100% • No symbols required but they might be installed if developer needs to analyze traces on device e.g. for security reasons • All possible information will be collected in the moment of crash and it will simplify analysis later by developer to fix issue. • Using centralized processing allows to identify most critical/often issue and verify integrated fix based on statistics from device population i.e. absence/reduction of new crashes with the same backtrace. • Integration with test cases (auto-upload), JIRA and probably sources indexing services (we have used Mozilla MXR) – it will reduce a lot efforts to issues reporting, identification, prioritization, fixing and verification. • Secure dumps/crashes delivery from device to collection servers depending of dump type and application.
Legacy architecture: Tizen • The legacy implementation has the following areas which would be nice to improve: • kernel oopses (crashes) are not supported • using preload library libsys-assert.so lead to unwanted code execution during any application startup and required symbols installed on device • Crash Worker starts to do work after crash in short but non-controllable time – so reported data outdated for crash • the core dump files are large (in theory up to 3 GB size), thus not in all cases we can copy core file as expected in workflow • the processing crash jeopardizes device consumer qualities like performance and reliability (many copy operations with large files, using gzip, use the same space to store data) • the server part ccr.samsung.com was turned off due to security reasons, and that makes cross-analysis very difficult even practically not possible.
Android native – Google Breakpad • The Google Breakpad is a best multiplatform solution but: • Required linking for every process, according to documentation it leads to code changes but it can be done as a shared library • Not all application crashes can be handled – only after Breakpad initialized and if signal not handled by application • Produces minidumps based on: • ptracing crashed process – required CAP_SYS_PTRACE • processing core file – which might be up to 3 GB size • Dump generation done from server – so we may not have dumps when server is not started, crashed or already shut • By the way, debuggerd leaks in 4.2.1 about 25 MB dirty memory per 1000 crashes • File format is very strict and not compressed • Processor is already implemented and works for clients from Linux, Android, Windows, iOs, MacOS, Solaris, arm, x86, x86_64, ppc, ppc64, mips, sparc • The kernel panics are not supported at all even Android Panic facility is a part of kernel (apanic.c and apanic_mmc.c) • The VM crashes are not supported
Android VM – e.g. ACRA/Acralyzer • There are huge amount VM-based crash reporters, common problems: • Covers just a VM cases and provides just a basic information about system • Expected to have server available on-line, have unsecure and non-throttling connectivity, or have a problems with logic (e.g. cannot send – delete file) • Analyzer (server) part is primitive or just proprietary
Architecture: configurability • The uploader and crash reporting controlled from Settings, could be part of product but in disabled mode • The uploader partition could be not used for production devices (not expected many crashes) • Configuration for type=crash may looks the following: • /etc/dumper -- main configuration folder • config -- general configuration file • crash/ -- configuration for crash reporting • config -- file to be used for all crashes • app1 -- crash settings for app1 if non-standard • config -- e.g. own upload server or files • app2 -- crash settings for app2 if non-standard • etc….. • statistics/ -- configuration for statistics • config -- file to be used for all statistics uplods
Architecture: remarks • The proposed architecture is not a final and it is mostly a process due to crash reporting service will require constant work to cover new builds/requests from Customers. Running through /proc/../core_pattern avoid any impact to userspace until crash happened. Absense of daemon and having separate non-reflashable partitions guaranteed that crashes will be delivered from bricked device after re-flashing, if reboot happened during uploading and in other cases. • The adaptation of proposals to other components (MobileCare etc.) is a next step in this process, most likely some pieces should be re-used or replaced because implemented in a better way that I could imagine based on my experience. • The lifelogging (e.g. memory, power, system logs), kernel OOPS support, basic and extended crash dumping, support for VM problems reporting from Java, Python, etc. could be done on the further steps and in parallel.