440 likes | 786 Views
OMAP4 SysLink Overview. SysLink Team 1/13/2011. Topics. SysLink Architecture Overview SysLink Notify Overview SysLink Protocol Overview Device Error Handling Ducati Tracing – Trace Daemon. SysLink Architecture Overview. SysLink - Introduction. SysLink IPC for OMAP4 and beyond
E N D
OMAP4 SysLink Overview SysLink Team 1/13/2011
Topics • SysLink Architecture Overview • SysLink Notify Overview • SysLink Protocol Overview • Device Error Handling • Ducati Tracing – Trace Daemon
SysLink - Introduction • SysLink • IPC for OMAP4 and beyond • Is an evolution of both DSP/BIOS Link & DSP/BIOS Bridge • Main Features • Supports symmetrical IPC interfaces • Decoupled IPC & MM Frameworks • Scalable and Modular IPC architecture • Retains the IPC and Device Management features developed for OMAP3 IPC • Device Loading & Exception Handling • Dynamic Memory Mapping • Power Management • Support for ELF format • Flexibility to support parallel and custom 3rd party IPC • Enables remote procedure calls
OMAP4 SysLink Architecture User Kernel
SysM3-AppM3 (SysLink Functionality Split) • All SysLink IPC modules are available on both SysM3 and AppM3 cores with a few functional differences • Common IPC Components/Features • Notify, MessageQTransportShm, NameServerRemoteNotify, SLPM, MessageQ, NameServer, SharedRegion • Ipc Synchronization between each pair of processors • Traces • Exception Context Buffer • SysM3 only • Mailbox Interrupt Rx • A9 to AppM3 interrupt rerouting • M3-A9 Power Management notification
Notify Driver - Features • Rationale • Enable multiple clients to multiplex over a single physical interrupt between a pair of processors. • Keep the payload to the minimum, and allow higher level protocols to define their own transport memory needs. • Generic to handle physical interrupts without any messaging capability • Features • Simple and quick method of sending 32-bit message • Multiple clients can register for same event • Clients may be processes or threads/tasks or kernel modules • Callback function can be registered with an associated parameter to receive events from remote processor • All clients get notified when an event is received • Multiple clients can send notifications to the same event • Events are sent across to the remote processor • 32-bit payload value can be sent along with the event • Synchronous Messaging • Event IDs are prioritized. • 0 Highest priority • 31 Lowest priority • If multiple events are set, highest priority event is always signaled first • Notification can be unregistered when no longer required
Notify- knl NotifyHw krnl Mailbox -knl MBOX ISR Notify Driver NotifyShm Driver App-M3 App App NotifyDriver Shm_isr Notify_sendEvent ioctl NotifyDriver Shm_isr_appm3 notify_ducatidrv_send_event Ack MBOX IPI omap_mbox_msg_send ti_sdo_ipc_notify_exec Callback to AppM3 Application Notify SHM Clear notify flag SysM3 AppM3 Kernel Notify Send Event – From A9 to AppM3
Mailbox Hw Driver (InterruptDucati) Notify- knl NotifyHw krnl Mailbox -knl NotifyShm Driver Notify Driver App-M3 App App read() System call ,calls notify_drv_read MBOX notify_shmdrv_isr notify_exec Notify_sendEvent notify_add_buf_by_pid Notify SHM NotifyDriverShm_sendEvent InterruptDucati_intSend notify_add_buf_by_pid unblocks the notify_drv_read call and the callback pointer is returned to user space Set notify flag SysM3 AppM3 Kernel Notify Send Event – From AppM3 to A9
SysLink IPC Protocol - Modules Here are the main SysLink IPC modules. Data mover modules MessageQ ListMP Notify Helper modules SharedRegion MultiProc NameServer NameServerRemoteNotify NotifyDrivers Transports Heaps Gates NotifyDriverShm MessageQTransportShm HeapBufMP HeapMemMP GateMPs - GatePeterson GateHWSpinLock
ListMP • Purpose • Provides an open-ended queue • Enables variable-size messaging automatically • Design Features • Basic multi-processor doubly-linked circular linked list (ListMP). • Simple multi-reader, multi-writer protocol. • ListMP management is protected internally using an appropriate multiprocessor lock module –GateMP • Uses shared memory elements with portable pointers. • Building block for MessageQ Transport, and the HeapBufMP buffer. • Usage • Instances of ListMP objects can be created and deleted by any processor. • Each instance of the ListMP is identified by a system-wide unique string name. • To use a ListMP instance, Clients open it by name and receives a handle. • Elements can be put at the tail of the list, removed from head of the list • Elements can be inserted & removed at/from an intermediate location • Provides API for list traversal.
MessageQ • Purpose • Homogeneous or heterogeneous multi-processor messaging. • Message delivery functionality localized to pluggable transports achieving flexibility. • Design Features • Basic IPC queue with a single-reader and multiple-writers. • Supports the structured sending and receiving of variable length messages. • Supports two priorities of messages. • No restrictions on number of messages on the queue. • Uses NameServer for name management • Uses a Transport (MessageQTransportShm) for actual message delivery • Usage • Reader owns a queue and creates the queue. • Writers opens a queue by name. • Writers can also use a received message to send a response back.
IpcMgr • Purpose • One step setup for System integrators • Dynamic configurablity of shared memory objects without static assignment & management of memory. This becomes complex as we scale IPC processor pairs. • Design Features • Automates the setup of IPC modules – Notify, MessageQTransportShm, NameServerRemoteNotify, between a pair of processors, for easier setup. • Provides handshaking mechanism between a pair of processors (A9-SysM3, A9-AppM3, SysM3-AppM3) to synchronize startup sequences. • Provides IPC infrastructure setup synchronization by passing system configuration info from a slave processor to a master processor between a pair of processors. • Creates various IPC modules based on the shared IPC infrastructure information for ready-made usage between multiple processors. • Usage • Configures the IPC modules in each of the user process’s context by calling setup() and destroy() functions of IPC modules. • Integrated with device management API
MultiProc • Purpose • Simplest unified interface and processor definition on multiple processors. • Design Features • Basic lowest level helper module that centralizes the management of processor Ids. • Is essentially a lookup table of Processor names and their ids. • MultiProc order dictates the master-slave behavior assignment of certain asymmetric modules like IpcMgr. • Usage • Processor ids start at 0 and ascend without skipping values. • Id can be available at configuration time or at initialization time. • Configured automatically in each user process. • OMAP4 MultiProc Configuration • MPU: 3 • SysM3: 2 • AppM3: 1 • DSP: 0
SharedRegion • Purpose • Eases the address translation across multiple processors, each with their own address space • The HeapMemMP within a SharedRegion, provides an readily available shared memory manager. • Design Features • The SharedRegion module is used to identify shared memory regions in the system. • Provides SharedRegion pointers (SRPtr) or portable pointers from one processor to another. • Each SharedRegion can have a shared memory heap (HeapMemMP) created within the region, which can be used to provide a generic shared memory allocator. • This module creates a shared memory region lookup table. The lookup table contains the processor's view for every shared region in the system • Each processor will have its own lookup table • Each processor's view of a particular shared memory region can be determined by the same table index across all lookup tables • Each table entry is a base and length pair. This table along with the shared region pointer((32 bit SRPtr) is used to do a quick address translation. • SRPtr is built using some bits to represent the SharedRegion and a relative offset to indicate the relative address. • Usage • SharedRegion information from the M3 baseimages is read during IpcMgr synchronization to populate the region tables in A9 as well as in each user process.
NameServer • Purpose • Distributed table management to avoid update synchronization problems, and cyclic dependencies. • Design Features • The NameServer enables an application and other modules to store and retrieve values based on a name • The NameServer module manages local name/value pairs • Each NameServer instance manages a different name/value table (customization possible) • Names in a specific table must be unique, but the same name can be used in different tables. • Supports different lengths of value. • Name lookup across processors supported by a separate transport layer • Usage • The NameServer/ NameServerRemoteNotify is used by IPC components such as HeapBufMP, ListMP to get the portable pointer (Shared Region pointer) that gives the location of these module objects in shared memory. This is mainly used while opening an instance of above modules • The NameServer/ NameServerRemoteNotify is used by IPC MessageQ to exchange MessageQ id between Ducati and MPU
Gates - GateMP • Purpose • Unified interface for different proxy implementations • Common allocation & management of different gate interfaces from any processor • Design Features • Critical region mechanism to protect common resources • Abstracted interface for various implementations of hardware/software gates • Simple enter/leave APIs (no timeouts). • Implementations • GateHWSpinLock • Hardware gate based on SpinLock hardware IP • Support for protection between a number of processors • Default gate for current OMAP4 SysLink software. • GatePeterson • Software gate based on Peterson algorithm for protection between 2 different processors. • GatePeterson objects created in SharedMemory. • Default gate in OMAP4 SysLink 1.0 software or for SOCs not supporting the H/W SpinLock IP.
Heaps • HeapBufMP • Purpose • Fast-deterministic allocation & free time for a HeapBufMP buffer block. Memory calls are non-blocking. • Design Features • A fixed size buffer heap implementation that can be used in a multiprocessor system with shared memory • HeapBufMP manages a single fixed-size buffer, split into equally sized allocatable blocks. • HeapBufMP buffer blocks are maintained using a ListMP object. • This module is instance based. Each instance requires shared memory • The HeapBufMP module uses NameServer instance to store instance information when created • Blocks can be allocated and freed on different processors. • HeapMemMP • Purpose • Serves as a default allocator to define custom shared memory objects, and also as a macro-allocator for other shared memory heaps. • Design Features • A variable size buffer heap implementation that can be used in a multiprocessor system. • Non-deterministic allocation & free time, is flexible but less efficient than HeapBufMP. • Can be created by default within a SharedRegion. • All other features are same as HeapBufMP.
IPU-PM • Purpose • Represents the IPU Sub-system and IPU-managed resources to the Host OS PM frameworks. • Provides sub-system Power Management operations asynchronously to the main processor. • Decouples Idle processing and state transitions for maximum power savings. • Hibernation gets the IPU Sub-system out of the way of System PM when it is not in active use. That is, it allows the greater Core power domain (wherein the IPU resides) to be transitioned to low power states by the MPU with no coordination overhead. • Common resource pools provide the greatest system flexibility. • Design Features • Provides Power Management for the IPU Sub-system and its directly managed resources. • Provides a local interface to IPU clients for acquiring, activating, and tracking Shared Resources whose pools are managed on the MPU side. • Provides APIs for setting frequency, latency, and bandwidth Constraints for applicable HW modules. • Provides for Resource Cleanup in the event of an unrecoverable error in the IPU SS. • Participates in system power management operations such as Suspend/Resume. • Provides a framework for forwarding System Event notifications to registered IPU clients. • Supports system suspend and resume operations with context save and restore. • Supports self-Hibernation -- a zero power, rapid recovery state when the IPU SS is not in active use. • Supports efficient I2C controller sharing across MPU/IPU processor domains. • Usage • Resources are activated via the System PM (provided with power and clocks) when requested and deactivated when released. • Constraints may be placed to ensure required performance and limit low power / high latency transitions to meet the use case requirements. • Clients may register for system event notifications such as suspend and resume to take appropriate local actions.
SysLink Daemon • SysLink Daemon • Is a user-side daemon process, mainly used to load the Ducati cores • Privilege – User: Media, Group: Video • Also, responsible for restarting the Ducati cores in case of an MMU Fault or other remote core exceptions. • A common location to create the necessary HeapBufMPs to be used by all applications. • Rationale for using SysLink Daemon • To have the persistent Process that is detached from the Parent process that starts the Process. • Loader is pushed to User-space to avoid the file access from kernel.
SysLink Daemon ProcMgr Lib Rempote Proc BIOS Ipc-BIOS BIOS Ipc-BIOS Ipc_getconfig Ipc_setup Load (SYS) Start (SYS) Start (SYS) Release RST Ipc_start Ipc_attach MPU Ipc_attach AppM3 Load (APP) Start (APP) Start (APP) Release RST Ipc_start Ipc_attach MPU Ipc_attach SysM3 AppM3 Kernel SysM3 Ducati Bring Up Sequence
RCM Client • Features • Establish connection with a RCM server by name. • Allocation of memory for remote command messages. • Send remote command messages and receive return context. • Supports both synchronous and asynchronous execution modes. • Design • Multiple RCM Client instances supported. • RCM client will run in the caller thread context i.e. IL Client/ OMX proxy component. On Ducati, RCM client will run in the calling task context. • An RCM client can connect only to a single RCM server. • Multiple RCM clients can connect to a single RCM server. • RCM client will be implemented as a user-side library. • Supports one or many RCM Clients per process. • Usage • Remote OMX component initialization, function invocation and callbacks by invoking remote functions (RPC SKEL functions) on the remote core
RCM Server • Features • Receives messages from the connected RCM client(s). • Invoke remote function (RPC Skel func). • Register remote functions (RPC Skel funcs). • Unregister remote functions (RPC Skel funcs). • Gets return value from RPC SKEL. Two kind of return values: RCM return value (that is used if there is any error in RCM layers) and OMX Return value (the actual return value of the OMX_XXX call) • Sends return context • Design • One RCM server instance for every process (can support multiple) • RCM server will run in it’s own thread context. • The RCM server thread will wait on receiving a message. As soon as it receives a message it will unblock and execute the required remote function. • Multiple RCM clients can connect to a single RCM server. • RCM Server will be implemented as a user-side library
Handled Errors • HW detected errors • Unicache MMU (Detected on Ducati and forwarded to A9 through Notify) • ACTION : We are terminating the Ducati as it is a major error • L2 MMU Faults (Detected on A9) • ACTION : We are terminating the Ducati as it is a major error • Detection is made on Ducati as well, as MMU fault is added on Cortex-M3 for ES2.0 and it gives the ability to retrieve the PC where the issue occurs for precise faults. • Ducati Data Aborts (Detected on Ducati and forwarded to A9) • ACTION : We are terminating the Ducati as it is a major error • SW detected errors • Endless loop tasks using a watchdog timer (Detected on A9) • ACTION : We are terminating the Ducati as it is a major error. • A9 Process terminated abnormally • ACTION : Cleanup the resources associated with this process • SysLink Daemon crash • ACTION : Restart the daemon
Resource management software design (1/2) • SysLink Daemon in charge of: • Loading the base image code • Initializing/Uninitializing Ducati • A9 side Ducati Resource Manager kernel driver in charge of: • Tracking resources requested by Ducati • Each linux user process that uses Ducati perform following operations (depending on the needs): • Opens a device file on SysLink IPC kernel driver to be able to communicate with remote processor • Opens a device file on the IO-MMU kernel driver to be able to share buffers using SysLink DMM feature • Opens a device file on the TILER kernel driver to be able to allocate TILER buffers • Opens a device file on the Remote proc kernel driver to be able to get notifications when the remote processor is stopped/started. • Linux kernel closes automatically the device files if not done explicitly by the process and the resources are cleaned in this context.
Resource management software design(2/2) • Resources that can be allocated by Ducati tasks: • TILER Memory • Allocations requested from the process that has created the Ducati task on behalf of Ducati task. • Tiler Driver is in-charge of tracking allocated memory per Process. • Regular Memory using SW DMM for huge memory blocks: • Allocations requested from the process that has created the Ducati task. DMM driver is in-charge of tracking the allocated memory per process. • Local BIOS Heaps (from SDRAM or L2RAM): • Release of this memory must be managed on Ducati side as A9 is not aware of. • HW resources • Ducati side Resource Manager is in charge of tracking allocated HW resources at task level. • Release of these resources is managed on Ducati side as A9 is tracking at subsystem level only. • Power resources (same as for HW resources) • Ducati Power Manager is in charge of tracking set power constraints by module… at task level. • Release of these constraints must be managed on Ducati side as A9 is tracking at subsystem level only.
MMU Fault Handling • Handles MMU faults and provides the stack dump when the MMU fault occurred. • Expectations from Users that are using SysLink • Register and wait for MMU fault notification. • On MMU fault notification, close the IPC handle(s). If any user still has open handle(s), then SysLink doesn’t initiate recovery process • Ducati is reloaded once all user processes close the IPC handles • See the figure in next slide for sequence flow of how the MMU fault is handled.
Ducati Exceptions (Data Aborts) • Ducati can crash due to other reasons than MMU faults, and these errors are handled using SYS error mechanism. • Ducati sends SYS error event to device manger handler on A9 side. • Event is notified to registered users in A9. • Expectations from Users • Register for Sys error notification • On error notification, close the IPC handle(s) • See the next slide for sequence flow
Ducati Hang • Ducati hangs, where one process in Ducati spins in a while loop are detected using GP timer/Watchdog like mechanism. • Expectations from Users • Register for hang notification • On event notification, close the IPC handles
A9 Process Abnormal Termination • Ducati component/task need to know when the A9 process terminated abnormally to release the corresponding Ducati resources. • Such occurrence is communicated using PID_DEATH event to Ducati. • The Ducati component registers with SysLink framework to receive this notification. • See the following slide for sequence flow illustrating the case where the A9 process that exchanged buffer with Ducati is terminated.
SysLink Daemon Crash • SysLink Daemon is in charge of loading the Base Image and setting up IPC • Ducati is put in reset state upon SysLink Daemon crash. • Expectations from Users • Register for PROC_STOP event • On event notification, close the IPC handles • See the following slide for sequence flow
Ducati Crash Info - Example Error Type • MMU Fault (Source core not identifiable) • SysError (Source core (SysM3 or AppM3) is identified) Execution State • Type of task (Task, Swi, Hwi) • Task handle • Address and size of stack • Internal state registers snapshot Stack • Stack of the offending thread • Bottom up (Grow from bottom to top) • 0xbebebebe marks unfilled blocks • Depends on type of build profile (whole_program_debug results in optimized (shorter) stack
Ducati Crash Info - Locating Error Source • Note PC(R15) from Execution State • Look up the corresponding symbol in the map file • Map file is found under package/cfg folder of the binary source folder. It has the same name as the loaded base image with the addition of “.map” at the end. • Look under the “GLOBAL SYMBOLS: SORTED BY Symbol Address” section to easily lookup the corresponding symbol whose memory range encloses the PC address • Search for the symbol in source code. The location can be further confirmed by matching function arguments, variable addresses with values of R0, R1, R2, R3 etc. • The LR indicates the return point of current sub-routine and can be used to also confirm the location by identifying the nesting pattern. The debug profile will be more meaningful in this regard as optimization might not preserve call sequence.
Ducati Tracing - Trace Daemon • SysLink Trace Daemon dumps out the traces from Ducati cores. • The tracing mechanism is based on the circular buffer in the shared memory. Each Ducati M3 has its own shared memory. • Trace daemon wakes up at regular intervals to dump out any traces available from Ducati. The interval at which it dumps the traces can be changed. • On BIOS, the SysMin is configured to route the prints to the trace buffer. • Use System_printf and System_flush to print and flush the traces to shared memory. • Way to start the trace daemon. • ./syslink_tracedaemon.out • Dump format: • [APPM3]: APP M3:MultiProc id = 1 • [APPM3]: APPM3: Ipc_attach to host ...Host DONE • [APPM3]: APPM3: IPC attach to SYSM3... DONE • ------ • [SYSM3]: Ipc_start status = 0 • [SYSM3]: Ipc_attach to host ...Host DONE • [SYSM3]: Ipc_attach to AppM3 ...AppM3 DONE