100 likes | 238 Views
Water Analytics Platform on AWS. Team Members Srinivasan Vembuli Rikio Chiba Romeo Luka Under the Supervision Prof. Murlikrishna Viswanathan. Background. The Department of Environment, Water and Natural Resources (DEWNR) leads the management of South Australia’s most valuable resource .
E N D
Water Analytics Platform on AWS Team Members SrinivasanVembuli Rikio Chiba Romeo Luka Under the Supervision Prof.MurlikrishnaViswanathan
Background • The Department of Environment, Water and Natural Resources (DEWNR) leads the management of South Australia’s most valuable resource. • The DEWNR collects water data from various sources and disseminates this to other agencies • Data currently stored in multiple systems • Hydstra(Legacy FoxproDB) • SQL Server • Data is currently being used by the Bureau Of Meteorology (BOM) for its analytics applications and by DEWNR in Water Connect Website applications
WDTF • The Water Data Transfer Format (XML) developed in 2008 is a national standard for transferring water information. • Over 240 organisations are required to give specified water information to the Bureau under the Water Regulations 2008. • BOM is using data from the current system in Water Data Transfer Format (WDTF)
Existing System Architecture Data Source Storage / Application Output Other Data GIS Application Field Sensors Data Mart SQL Server WDTF Hydstra Raw Data Raw Data Raw Data Foxpro DB Analysis
Problem Definition • The current architecture relies on multiple systems running on legacy software ,i.e., Hydstra (Foxpro DB) • This leads to increased costs and inefficiency in service delivery • Current architecture does not fully utilise WDTF as the universal data format standard
Project Objectives • Help DEWNR to use data in WDTF format to generate analytical data similar to BOM for public consumption (Open Data: OTF is a facilitator for SA Gov.) • Develop a cloud-based ETL system to manage water data (in WDTF) from across Australia • Providing useful analytics or insights from this data using different data mining and visualization techniques. • Some examples include time series analysis of aggregated ground-water/surface-water data and real-time mapping of water data using dashboards and mapping APIs.
Solution • Hosting Water data on Cloud • Establish integrated data analysis platform • Publish and utilize water data for third party organizations
Architecture on AWS AWS Data Pipeline Daily task Daily task Daily task Daily task Amazon EC2 Parse WDTF files JSON Why do we need to use Redshift? Copy WDTF files Store the data to Redshift Amazon EMR Amazon Redshift Local FTP Server Amazon S3 Analysis Why do we need to use EMR?
Current Project Status • Wrote 2 Perl parser programs that do the following tasks: - • 1st parser unzips Zip files to generate XML files • 2nd parser that converts XML files to JSON Zipped file -> (Unzip Parser) -> XML files -> (Convert Parser) -> JSON • Researching how to convert JSON files to tables using EMR algorithm & plug it to redshift for analytics
Deliverables • Prototype of the proposed architecture • Technical Document • Project report