380 likes | 554 Views
Efficient Real-Time Multicore Image Processing on TI C66x final presentation. Yaron Doweck Yael Einziger Supervisor: Mike Sumszyk Spring 2011 Semester Project. Learn to use the new TI C6678 multi-core platform and to exploit its abilities and advantages.
E N D
Efficient Real-Time Multicore Image Processing on TI C66xfinal presentation YaronDoweck Yael Einziger Supervisor: Mike Sumszyk Spring 2011 Semester Project
Learn to use the new TI C6678 multi-core platform and to exploit its abilities and advantages. • Implement a real-time tracking algorithm using multi-core programming and VLIB. • Create a framework for multi-core, Ethernet video streaming and DSP-FPGA communication. Project Goal
Keystone Architecture • SYS/BIOS • VLIB • Tracking Algorithm • Tracking System • Performance Analysis • Encountered Difficulties • Future Projects Contents
In the first part of the project, the main goal was learning: • The C6678 platform • TI development enviroments • The Multi Core SDK • The SYS\BIOS Real-Time OS Learning the Platform
8 cores External Memory Controller • 3EDMA • Controller Multicore Navigator KeyStone Architecture • Network Coprocessor Semaphore Module
DDR3: Up to 10666MB/s Shared Memory: 4 access ports, each up to 16000MB/s TeraNet KeyStone Architecture TeraNet Switch Fabric: Up to 256GB/s
C66 DSP Up to 20 GFLPOS @ 1.25GHz 32KB L1D Cache\SRAM 32000MB/s 32KB L1P Cache\SRAM 32000MB/s C66 CorePac 512KB L2 Cache\SRAM 16000MB/s
KeystoneArchitecture • SYS/BIOS • VLIB • Tracking Algorithm • Tracking System • Performance Analysis • Encountered Difficulties • Future Projects Contents
SYS/BIOS is an advanced real-time light operating system from Texas Instruments. • It is designed for use in embedded applications that need real-time scheduling and synchronization. • SYS/BIOS is delivered as a set of pre-compiled packages that provide the modules that make up the OS. • Each can module is loaded and configured separately (only the selected modules are loaded making the OS as light as possible). SYS/BIOS
Main SYS/BIOS modules used in the project: • BIOS – Manages the OS. • Task – Creating and managing threads. • HWI – Hardware Interrupts. • Semaphore - Creating and managing semaphore. • IPC – Inter Processor Communication. • Timestamp - Provides timestamp service for performance analysis. SYS/BIOS
KeystoneArchitecture • SYS/BIOS • VLIB • Tracking Algorithm • Tracking System • Performance Analysis • Encountered Difficulties • Future Projects Contents
VLIB is an extensible library of more than 40 software kernels that are optimized for TI's C64+ digital signal processor (DSP) core. • These kernels execute background modeling and subtraction, object feature extraction, tracking, recognition and low-level pixel processing to provide a foundation for video analytics applications development. VLIB
TI has also provided developers with a bit-exact version of the library for testing and debugging in PC (Windows) environment. • VLIB’s version used in this project is an unofficial release compiled with C66x support obtained from TI Video Surveillance team (VLIB’s developers). VLIB
KeystoneArchitecture • SYS/BIOS • VLIB • Tracking Algorithm • Tracking System • Performance Analysis • Encountered Difficulties • Future Projects Contents
1 2 Tracking Algorithm
Unlike moving objects, the background of the image doesn’t change. • However, there are still some small variations along time due to luminosity change, camera noise, trees, etc. • Hence, by studying the variation along time of each pixel, we can deduce whether it belongs to a moving object or to the background. • API: VLIB_subtractBackgroundS16 Statistical Background Subtraction
Groups foreground pixels that have other foreground pixels as 8-connected neighbors, and labels discrete groupings as components. • Once accomplished, component properties can be measured and used to extract foreground information. These properties include bounding box, centroid and area. • API: VLIB_createConnectedComponentsList Connected Components Labeling Binary Foreground Image Connected Components Labeling
Tracker association is done by matching each component to the closest tracker for previous image. • If no existing tracker is close enough, a new tracker is associated with the component. • After all components are associated, any left trackers are discarded. Tracking
KeystoneArchitecture • SYS/BIOS • VLIB • Tracking Algorithm • TrackingSystem • Performance Analysis • Encountered Difficulties • Future Projects Contents
The developed tracking system demonstrates multicore programming on the DSP using it’s powerful features: • Network coprocessor. • EDMA engine. • Multicore Navigator. • Synchronization modules. • Event-driven operations. Tracking System
1* * (1) and (4) where implemented using openCV2.3 with 2 separate threads. 2 Tracking SystemGeneral flow 3 4*
From PC Image DDR3 Ethernet Controller Packet DMA To PC EDMA3 Packet DMA Shared Memory Double Buffer Trackers’ data Tracking SystemDSP data flow Cache Controller L1 Cache Cache Controller Processing
CORE 0 Notify that foreground image is ready CORE 1 Processing Message Processing DDR3 SHARED MEMORY Packet DMA EDMA3 (to Ethernet) Tracking SystemShared Memory Queue Background Model List of Trackers Image at T Image at T-1 Binary foreground Images Var Mean
CORE 0 Each block is event driven. That is, it wakes up only when a specific event happens. EDMA Interrupt Statistical Background Subtraction Tracking SystemPipeline Image Processing Semaphore Sync. CORE 1 Multicore Message Connected Component and Tracking Multicore Messaging Service
KeystoneArchitecture • SYS/BIOS • VLIB • Tracking Algorithm • Tracking System • Performance Analysis • Encountered Difficulties • Future Projects Contents
Ethernet: PCDSP10MB/s > 34FPS DDR3 Ethernet Controller Packet DMA DDR3 memory throughput: 10666MB/s via EDMA3. To PC Packet DMA Shared Memory Double Buffer Performance Analysis SRAM (local\shared) memory throughput: 16000MB/s, direct access. Processing L1 Cache Cache Controller
In conclusion, The system can process frame size of 120x160 or 240x320 at up to 30FPS. • Shared SRAM size: L2 double buffering requires 2Byte/pixel. Gaussian model (for background subtraction) requires 4Bytes/pixel. Largest frame size possible: 240x320. • Webcam: Up to 30FPS. Frame size 120x160, 240x320 or 480x640. • By processing only a part of the image at a time, the size of the double buffer can be significantly reduced allowing larger frame size. Performance Analysis
SHARED MEMORY Queue Background Model List of Connected Components Image at T (Char) Image at T-1 (Char) Binary foreground Images Mean (Short) Var (Short) Implementation of Connected Component algorithm will be more complicated Memory can be significantly reduced by processing a part of the frame at a time Performance AnalysisReduced Memory Analysis
VLIB is optimized for TI C64+ DSP. • As a part of the performance analysis, VLIB’s performance on the C66 core and on the C64+ core was compared: VLIB’s Performance *Since Connected Components requires a lot of memory, the image was located in L1 but the Connected Components buffer was located in L2 memory.
KeystoneArchitecture • SYS/BIOS • VLIB • Tracking Algorithm • Tracking System • Performance Analysis • Encountered Difficulties • Future Projects Contents
Documentation: • Incomplete platform’s documents. • Lack of documentation for the MCSDK examples. • Had to learn by Trail-and-Error method. • Posted questions on TI’s E2E forums. Encountered Difficulties
Software Bugs: • Software bugs in the development environment. • Software bugs in the MCSDK examples. • Repeatedly updated software versions. • Some bugs are still unsolved. • Unable to receive large UDP transmits. Encountered Difficulties
TI’s E2E forums were highly effective in solving problems. • The posted questions were answered by TI’s employees almost immediately. TI Support Forums
KeystoneArchitecture • SYS/BIOS • VLIB • Tracking Algorithm • Tracking System • Performance Analysis • Encountered Difficulties • Future Projects Contents
Tracking system can be enhanced to support larger frame size. • Motion estimation (e.g. Kalman filter) can be added for better tracking capabilities. Possible Improvements
The final report was written as a user’s guide to the DSP, the development environment and the tracking system. • The program on the DSP’s side is highly modular. Can be easily adapted for any type of multi core pipeline processing. The Project as a Framework