200 likes | 291 Views
by Nirish Dhruv Department of Computer Science Advisor Dr. Taek Kwon Department of Electrical and Computer Engineering. DESIGN OF LARGE SCALE DATA ARCHIVAL AND RETRIEVAL SYSTEM FOR TRANSPORTATION SENSOR (WRITE-ONCE-READ-MANY TYPE) DATA. Graduate Comitee
E N D
by Nirish Dhruv Department of Computer Science Advisor Dr. Taek Kwon Department of Electrical and Computer Engineering DESIGN OF LARGE SCALE DATA ARCHIVAL AND RETRIEVAL SYSTEM FOR TRANSPORTATION SENSOR (WRITE-ONCE-READ-MANY TYPE) DATA. Graduate Comitee Dr. Donald Crouch Dr. Carolyn Crouch Dr. Taek Kwon Department of Computer Science Department of Computer Science Department of Electrical and Computer Engineering
Background • ITS sensor networks produce huge amount of data • Presently used for operational and monitoring uses due to huge size of data • Examples: RWIS, WIM and traffic detector networks • Efficient archival/retrieval need for planning and research
Problem Statement • Present TMC Archive • Flat zip compressed format • Difficult to extract spatially correlated data • Need for efficient archival / retrieval for spatially and/or temporally correlated data
1 2 3 4 5 . . . . 2880 00:00:00 00:00:30 00:01:00 00:01:30 00:02:00 . . . . 23:59:30 1-byte 1-byte 1-byte 1-byte 1-byte . . . . 1-byte 2-byte 2-byte 2-byte 2-byte 2-byte . . . . 2-byte Zip ###.v30 & ###.o30 files for 4000 Sensors ###.v30 file (2880 bytes) ###.o30 file (5760 bytes) yyyymmdd.traffic file Existing File Format and Archive • Unified Traffic Data Format (UTDF) Record Time Volume Occupancy
Review of Large Data Archive • Data Warehouse • Inflow: To get data from various systems • Upflow: Put data to a more compact from • Downflow: Put compact data form to archival storage • Outflow: Output data to consumers as required • Metaflow: To manage warehouse itself
Why Data Warehouse? • Simplicity • Better Quality of Data • Fast Access • Platform Independent
Hierarchical Data Format (HDF) • File format and library for storing scientific data • Software includes I/O libraries and tools for analyzing, visualizing, and converting scientific data. • Platform Independent
Common Data Format (CDF) • Self-describing data abstraction for the storage and manipulation of multi-dimensional data in discipline-independent format • File format and a library • Transparent data compression • Platform Independent • API available in C, FORTRAN, Java, and Perl
Creating Traffic CDF Traffic Archive C Program (.EXE) CDF 2.7 C API (DLL, Lib and cdf.h file) traffic.cdf
Traffic Data Archive in CDF • Designing Data Structure for traffic data • Setting Dimensions • Setting Variances • Setting CDF variables, CDF data types, CDF attributes (meta-data), and compression algorithm
CDF Archive (.cdf) C Program using CDF API (.EXE) Station Definition ~~~~~~~~~ ~~~~~~~~~ ~~~~~~~~~ ~~~~~~~~~ Volume Count (.txt) ~~~~~~~~~ ~~~~~~~~~ ~~~~~~~~~ ~~~~~~~~~ Data Retrieval in CDF
Traffic Data Archive (zipped Binary files) Traffic Data Archive (SQL Server 2000) Dynazip Active X control ADODB Connection 32-bit ODBC (DSN) Visual Basic Interface Data Archive in SQL Server
Retrieval Task Station 1: 10069N Detectors: 3263,3264,3265,3266 Station 1: Volume Computation 3263(Vol)+ 3264(Vol)+ 3265(Vol+ 3266(Vol) Text File: 10069N Total Vol 10069S Total Vol . . . 17750W Total Vol Station 2: 10069S Station 2: Volume Computation Station 492: Volume Computation Station 492:17750W
Conclusions • Transportation archive using CDF could be a better archive due to following reasons • More data storage with almost no additional storage requirements • Indexed data allowing random access • Open standard, portable and free • Can be used directly with many scientific visualization and analysis packages
Conclusions • RDBMS is less suitable for large-scaled traffic data due to following reasons • Large storage requirements due to overheads • Retrieval is comparatively quite slow • Initial investment is expensive
Future Work • Using XML with CDF for web • Scaling CDF • Adding more Features • Variables and attributes