260 likes | 277 Views
A Software/Hardware Co-Design Framework for the ‘Internet of Eyes’. Cathal Garry, Derek Molloy Entwine Centre for IoT, Dublin City University. Introduction. The main challenge examined in this paper was to bring ‘eyes’ to the Internet of Things in real time
E N D
A Software/Hardware Co-Design Frameworkfor the ‘Internet of Eyes’ Cathal Garry, Derek Molloy Entwine Centre for IoT, Dublin City University
Introduction • The main challenge examined in this paper was to bring ‘eyes’ to the Internet of Things in real time • Background research indicates current technologies that can facilitate this are: • Cloud Computing • GPUs • FPGA • Neuromorphic Chipsets • SDSoC
What are SDSoCs? • An SDSoC is an integer circuit that contains a processor, a number of peripherals and some programmable logic • SDSoC like the Xilinx Zynq chipset consist of two main components on the same SoC: • The processing system (PS) • The programmable logic (PL) • The PL is used to create custom IP (intellectual property), which is linked to the processing system using standard AXI AMBA interfaces • The processing system is used to run a software stack, which can accessthis custom IP in the programmable logic
What are SDSoCs? [Xilinx SDSoC Overview] • PL can be updated either before or dynamically during run-time operation by software • Effectively allows software to redefine the hardware • This simplifies the process of the software development flow
Advantages and Disadvantages • Cloud computing can offer real time imaging processing while saving on local power consumption. But in areas with restricted network access latency can be a problem • GPU offer real time imaging processing at the edge but have high power requirements • FPGA can offer real time imaging processing with relatively low power consumption but they require a developer to have a high level of expertise • Neuromorphic chipset like the Movidius compute stick are a relatively new to the market and require a high level of expertise in order to implement a solution • SDSoC is the only solution that can offer low power consumption and real time image processing while keeping development complexity relatively low
Architecture • The aim of this architecture was to develop a solution that could provide low power consumption and real time image processing using an SDSoC • The proposed architecture is made up of three components • The producer • The handler • The consumer • Architecture was applied to a chosen application which was a variable speed limit controlled motorway
The Producer • The SDSoC that was chosen for this research was the Xilinx Zynq chipset • The processing system on the Xilinx Zynq chipset contains an ARM A9 processor along with a number of standard peripherals like UART and I2C • The programmable logic contains a number a system gates, DSP and RAM • There are many Zynq platforms available on the market, the one that was chosen for this research was the PYNQ platform [Xilinx Zybo]
PYNQ Platform • PYNQ or Python for productivity for Zynq is a Xilinx platform that provides a software stack that allows developers to access the benefits of an FPGA without learning advanced skills • The PYNQ platform provides this support through Python libraries for accessing the PL. • Running the PYNQ platform can be done over UART or through Jupyter notebooks [Xilinx PYNQ]
PYNQ Platform • The PYNQ platform runs Ubuntu-based Linux which is optimized for developer productivity and provides support for many standard drivers and libraries • The framework also provides a function called Overlays which allows for the hardware in the PL to be reprogrammable at run time • The PYNQ framework can be ported to other Zynq based platforms as well • The application of a variable speed limit (VSL) controlled motorway was implemented by splitting the application between the PS and PL
The Producer – Processing System • The PS was used to monitor the vehicle count and send data to the handler using MQTT • The PS read the result from a register in the custom IP. This was read over an AXI Lite interface for each frame in the input video stream • The PS also stored a history of the count values provided by the custom IP in the PL. This history was then used to create a congestion level on the motorway
Producer – Programmable Logic • PL was used to implement a custom IP for counting the number of videos in a given image frame • Performed by implementing a number of image processing technics in Vivado HLS • Result of this image processing was stored in a register which could be accessed over an AXI Lite interface
The Handler and Consumer • Remaining parts of the architecture are the handler and consumer • The handler acts as an intermediate agent between the producers and consumers in the network • The consumer acts as an endpoint for the producers data. • It can receive data from a single or multiple producers in order to make a decision
Power Consumption • The power consumption was measured using a number of different profiles including: • Different types of amount of programmable logic in the PL • Different processor states in the PS • The worst case power consumption when performing some image processing in the PL and in the PS was 2.5 Watts
Response Time • The response time was measured using a number of different platforms and processors • The tests were also varied using a number of different image processing tasks across different image resolution • Worst case response time for the PL when processing a 1080p image at 30fps was 40ms • This increases to 50ms when testing a 1080p image at 60fps
MQTT Latency • MQTT was used as the transfer protocol so it was important to determine the latency across the network • The MQTT latency was measured by varying the number of messages published per second • The response time is in the msec range, once the number of published message is less than 100 per second • After this the latency increases by 1000x
Register Access Times • Since the result is produced for each frame in a video stream it was important to determine the register access time from the PL • The register access time was measured over a varying number of reads per second across a number of iterations • The worse case latency was ~100usec which is more than enough for a 60fps video
Overlay Switching • Overlay switching allows the user to change the logic in the PL at run time – e.g., change from day time to night time image processing algorithm • This test measures how long it takes for the programmable logic to change and for the image processing to restart • The worst overlay switching time found in this test was 30 seconds
Other Analysis • The implementation in this research found some types of image processing are better suited to SDSoC than others: • Image processing techniques that are very spatially localized in nature perform better in the programmable logic • The further away a required pixel is (spatially) the more memory is required to store it • Alternative approach to this is to split the image processing between the PS and PL (e.g., for higher level reasoning)
Conclusion • The Architecture provides a scalable IoT architecture using a software/hardware co-design for real–time IoE applications • The research provides an implementation and evaluation of this architecture through the development of a full stack IoE application
Question? Acknowledgements This research was supported by Xilinx Inc. who provided the PYNQ platform used in this project. In particular we would like to thank Cathal McCabe and Peter Ogden from Xilinx Inc who provided technical support during the research. We would also like to thank the Intel Corporation for their support throughout the DCU master’s program and the development of this research.
References • [1] Xilinx SDSOC Overview, “SDSoC Overview”, [Online]. Available: www.xilinx.com/support.html • [2] Xilinx Zybo; “Zybo Reference Manual”, [Online]. Available: www.xilinx.com/support/documentation/university/XUP%20Boards/XUPZYBO/documentation/ZYBO_RM_B_V6.pdf • [3] Xilinx PYNQ, “PYNQ Python Productivity for Zynq”, Xilinx Documentation