260 likes | 386 Views
Departamento de Informática Universidad de Extremadura SPAIN. Porting P4 to Digital Signal Processing Platforms. Juan Antonio Rico Gallego Juan Carlos Díaz Martín José Manuel Rodríguez García Jesús María Álvarez Llorente Juan Luis García Zapata. Index. Introduction and goals
E N D
Departamento de InformáticaUniversidad de Extremadura SPAIN Porting P4 to Digital Signal Processing Platforms Juan Antonio Rico Gallego Juan Carlos Díaz Martín José Manuel Rodríguez García Jesús María Álvarez Llorente Juan Luis García Zapata EuroPVM/MPI 2003. Venice, September 29 – October 2
Index • Introduction and goals • IDSP: A Distributed Framework for DSPs • Implementing the P4 functionality upon IDSP • Measuring the P4 Overhead • Conclusions • Current and Future Work EuroPVM/MPI 2003. Venice, September 29 – October 2 2
Introduction and goals DSP processors show specialized architectures to run real-time digital signal processing Fields of application: • Communications • Voice and Data Compression • Mobile Telephony • Speech Processing • Image and Video Processing • Medical • more ... EuroPVM/MPI 2003. Venice, September 29 – October 2
Sundance SMT310Q PCI carrier board with four TI C6201 DSPs Introduction and goals Target machines Nets of DSP multi-computers such as those from Sundance™, Motorola™ or Hunt Engineering™. EuroPVM/MPI 2003. Venice, September 29 – October 2 4
Introduction and goals Target machines The Texas Instrumens C6000 familyof DSPs: 150-MHz. Capable of delivering 900 MFLOPS 16 or 32 MBytes of 100 MHz SDRAM 64 Kbytes of CACHE / internal RAM 128K Bytes of flash programmable and erasable ROM No MMU for virtual memory management Very limited resources Targeted to embedded systems EuroPVM/MPI 2003. Venice, September 29 – October 2 5
Most applications can be decoupled and distributed among two or more processors MPI Current DSP software poses the portability problem: • Platform specific • Provides only low level communication libraries • Poor support to build portable parallel applications A distributed programming standard like MPIis needed Introduction and goals High Computational Complexity and Real Time requirements EuroPVM/MPI 2003. Venice, September 29 – October 2 6
Tracing and Analysis IDSP: A Distributed Framework for DSPs DSP/BIOS. Texas Instruments Kernel for C6000 family of DSP processors (21 Kb) Thread Management: TSK_create TSK_delete Thread Synchronization: SEM_pend SEM_post Timing services: CLK_gethtime EuroPVM/MPI 2003. Venice, September 29 – October 2 7
IDSP IDSP: A Distributed Framework for DSPs IDSP. Our own development. It extends DSP/BIOS with distributed facilities (30 Kb) IDSP runs on • DSK (1 x C6000) • Sundance Multicomputer SMT310Q (4 x C6000) Thread Management: OPER_create OPER_destroy GROUP_create GROUP_destroy Thread P2P Communication: COMM_send COMM_recv COMM_asend COMM_arecv COMM_wait COMM_test ... EuroPVM/MPI 2003. Venice, September 29 – October 2 8
oper oper input input stream 1 1 4 stream 2 oper oper oper output stream 2 3 5 IDSP: A Distributed Framework for DSPs An operator is a thread that runs an algorithm: FFT, etc An IDSP application is a group of operators communicating by message passing IDSP address • Machine • Group • Operator • Port EuroPVM/MPI 2003. Venice, September 29 – October 2 9
A message passing kernel System servers operators RPC System GROUP_ Servers P4 address mapper Group CIO_ OPER_ Server Algorithm operator I/O Operator Server Server COMM_ Kernel Software Bus IDSP: A Distributed Framework for DSPs IDSP shows a microkernel architecture: EuroPVM/MPI 2003. Venice, September 29 – October 2 10
It shows a three layers design: • MPI macros • Abstract Device Interface • Channel Interface, being P4 a well known example MPI ADI P4 We have put P4 on top of IDSP: IDSP DSP/BIOS DSP/BIOS DSP/BIOS C6000 C6000 C6000 Implementing the P4 functionality upon IDSP MPICH is a portable implementation of MPI: EuroPVM/MPI 2003. Venice, September 29 – October 2 11
The P4 re-entrancy problem Processes P4 is process based: IDSP is thread based P4 library P4 library P4 library Threads Operating system Modified P4 library IDSP • PuttingP4 global variables in IDSP threads private zone • Using mutual exclusion mechanisms Implementing the P4 functionality upon IDSP A thread safe version of P4 has been built by: EuroPVM/MPI 2003. Venice, September 29 – October 2 12
P4 is based upon TCP/IP Berkeley sockets, but IDSP provides its own addressing scheme IP address P4 IDSP/ sockets IDSP address IDSP DSP/BIOS DSP/BIOS DSP/BIOS C6000 C6000 C6000 We have done IDSP/Sockets, a thin and efficient implementation of Berkeley Sockets atop IDSP Implementing the P4 functionality upon IDSP Communication network EuroPVM/MPI 2003. Venice, September 29 – October 2 13
Idsp_addr Ip_addr Idsp_addr Ip_addr receiver sender Idsp_addr Ip_addr Address User User Mapping Operator Operator Server 3 1 2 4 Get(ip_addr ) Idsp_addr = Register (idsp_addr, ip_addr) Implementing the P4 functionality upon IDSP The IP/IDSP mapping p4_send(rank, ...) send(IP_address, ...) COMM_send(IDSP_address, ...) Every user operator keeps a cache of addresses EuroPVM/MPI 2003. Venice, September 29 – October 2 14
Implementing the P4 functionality upon IDSP Signals P4 uses UNIX signals for time-outs and process management, but ... DSP/BIOS does not provide signals !!! DSP involved threads, however, exhibits a quite frequent interaction with the kernel for data I/O IDSP takes advantage of this principle for supporting the UNIX signal mechanism: • A special message is sent to the target thread • The target thread receive these message on next socket read EuroPVM/MPI 2003. Venice, September 29 – October 2 15
Implementing the P4 functionality upon IDSP The startup process P4 uses a text file specifying program files and machines: Local 0 Sun2 1 /home/user/P4pgms/sun/prog1 Sun3 2 /home/user/P4pgms/sun/prog2 rs6000 1 /home/user/P4pgms/rs6000/prog1 But embedded systems don’t use disks !! The IDSP approach is as follows: • Every operator has a well known integer identifier • A limited number of operators is linked • GROUP_create takes an array of operator identifiers • Currently, it assigns each operator to the least loaded machine EuroPVM/MPI 2003. Venice, September 29 – October 2 16
send COMM_send Measuring the P4 Overhead Overhead of the socket interface on IDSP Time to send short messages between two operators EuroPVM/MPI 2003. Venice, September 29 – October 2 17
Measuring the P4 Overhead Overhead of P4 interface on IDSP Time to send short messages between two operators P4_send COMM_send EuroPVM/MPI 2003. Venice, September 29 – October 2 18
Conclusions IDSP, a message passing interface for DSPs, has been defined and implemented The IDSP performance in the TI C6000 DSP architecture is currently reasonably good (50µs for short messages) We have been able of supporting P4 upon the small IDSP interface P4 performance upon IDSP is good, but not good enough for high performance distributed digital signal processing A more tuned channel interface layer is needed for DSPs EuroPVM/MPI 2003. Venice, September 29 – October 2 19
Current and Future Work IDSP is currently been augmented with MPI-like p2p primitives such as COMM_waitany, etc. A DSP specific channel interface layer will be developed. The ADI and MPI will be supported by such layer. The 64 bits C6400 family will be faced soon. EuroPVM/MPI 2003. Venice, September 29 – October 2 20
Thank you very much ! EuroPVM/MPI 2003. Venice, September 29 – October 2 21
Thank you very much ! EuroPVM/MPI 2003. Venice, September 29 – October 2 22
Implementing the P4 functionality upon IDSP Groups MPI implement the concept of group IDSP have a different concept of group ¿How is this managed? Groups and processes in a MPI application runs in the context of an IDSP group EuroPVM/MPI 2003. Venice, September 29 – October 2 23
Operator Communication Port SEND RECEIVE Additional Port CONNECTION_REQ DIE INITIAL_INFO Implementing the P4 functionality upon IDSP Listener process P4 uses an auxiliary process for doing background work IDSP have not an auxiliary thread ¿How do IDSP does this work? We use an asynchronous communicator for Doing this background work Sending initial information for threads to run (threads have not parameters at startup) EuroPVM/MPI 2003. Venice, September 29 – October 2 24
- Un thread IDSP corre un algoritmo en un sentido diferente que un proceso MPI/P4, que corren todos el mismo programa - EuroPVM/MPI 2003. Venice, September 29 – October 2 25
Idsp_addr Ip_addr Idsp_addr Ip_addr receiver sender Idsp_addr Ip_addr Address User User Mapping Operator Operator Server 3 1 2 Get(ip_addr ) Idsp_addr = Register (idsp_addr, ip_addr) Implementing the P4 functionality upon IDSP The IP/IDSP mapping P4 maps process ranks intoIP addresses IDSP/Socketsmaps IP addresses into IDSP addresses: Every user operator keeps a cache of addresses EuroPVM/MPI 2003. Venice, September 29 – October 2 26 26