E N D
Reading • Scientific data management at the Johns Hopkins institute for data intensive engineering and science Yanif Ahmad, Randal Burns, Michael Kazhdan, Charles Meneveau, Alex Szalay, Andreas Terzis, February 2011 SIGMOD Record , Volume 39 Issue 3 , http://dl.acm.org/citation.cfm?id=1942776.1942782&coll=DL&dl=ACM&CFID=66206057&CFTOKEN=48992457 • Migrating a (large) science database to the cloud Ani Thakar, Alex Szalay, June 2010 HPDC '10: Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing , http://dl.acm.org/citation.cfm?id=1851539&bnc=1 CSCE 824 - Spring 2011
Reading 3. M. Stonebaker, U. Cetintemel, One Size Fits All": An Idea Whose Time Has Come and Gone, in Proceeding of CDE '05 Proceedings of the 21st International Conference on Data Engineering, IEEE Computer Society Washington, DC, USA, 2005, http://www.computer.org/portal/web/csdl/abs/proceedings/icde/2005/2285/00/22850002abs.htm CSCE 824 - Spring 2011
Traditional Database Management Systems • Focus on business data management • Provide uniform capabilities regardless of the data characteristics • Need: capabilities to meet new application requirements CSCE 824 - Spring 2011
Examples of New Needs • Stream Data Processing • Large scale scientific databases • Data warehousing CSCE 824 - Spring 2011
Streaming Data • Sensor-based applications • Real-time systems: sophisticated alerting, location-based services, • Historical data • Financial applications • Support applications, such as electronic trading, legal compliance, real-time marker analysis, etc. • Performance requirements CSCE 824 - Spring 2011
Performance SDMS vs. RDMS • Empirical results (see reference paper #3) • Issues: • Inbound processing model • Correct primitives for stream processing (aggregates, “timeout,” “slack”) • Seamless integration of DBMS processing with application processing (client-server vs. embedded applications) • Transactional behavior (weaker notion of recovery, tolerance, no ACID requirements) CSCE 824 - Spring 2011
Security for Streaming Data? • What is the difference between the security needs of streaming vs. traditional (e.g., relational) data? • How to enforce security? • Security punctuation CSCE 824 - Spring 2011
Scientific Databases • Massive amount of data • Heterogeneous data • Sensor data, satellite, scientific simulation data, etc. • Goal: better understanding of physical phenomena • Genomic database, geological exploration, astronomy, etc. CSCE 824 - Spring 2011
Scientific Databases • Need efficient analysis and querying capabilities • Multi-dimensional indexing (e.g., genomic sequence indexing) • Specific applications (e.g., visualization of seismic data) • Specific aggregations (e.g., data mining for biological correlation) • Efficient data archiving, staging, lineage, and error propagation techniques CSCE 824 - Spring 2011
Example Scientific Data Management • Reference #1 • Basic research: • formation of hypotheses and theories • designing experiments for their validation • collecting data by experimentation • analyzing data to guide new insights for further research CSCE 824 - Spring 2011
Scientific Computing • Steps 3 and 4 are data intensive • Need to improve computational power • Parallel processing • Grid and supercomputers • Special application logic • Preservation of scientific data CSCE 824 - Spring 2011
Current Technologies and Scientific Databases • Reference #2: How to migrate large scale scientific database to cloud environment? • Difficult engineering process • Limited capabilities of database user • Based on commercial cloud CSCE 824 - Spring 2011
Data Warehousing • Repository of data providing organized and cleaned enterprise-wide data (obtained form a variety of sources) in a standardized format • Data mart (single subject area) • Enterprise data warehouse (integrated data marts) • Metadata CSCE 824 - Spring 2011
Data Warehousing • Difference between OLTP and OLAP • Data management: updates, indexing, dependencies, etc. • OLAP: needs Read Optimized storage CSCE 824 - Spring 2011
Next Class Geographical Databases CSCE 824 - Spring 2011