1 / 27

Database Research: The Past, The Present, and The Future

This research paper analyzes the evolution of data management and the current and future trends in the field of database research. It discusses the driving forces behind database research, the past advancements, and the potential research problems in the future. The paper also highlights the importance of integrating text, data, code, and streams, and the need for new user interfaces and trustworthy systems.

davissons
Download Presentation

Database Research: The Past, The Present, and The Future

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Database Research:The Past, The Present, and The Future Yi-Shin Chen Department of Computer Science National Tsing Hua University yishin@cs.nthu.edu.tw http://www.cs.nthu.edu.tw/~yishin/

  2. Outline • Motivation • The Past • Evolution of Data Management [Gray 1996] • The Lowell Database Research Self Assessment Report • Where did it come from? • What does it say? • The Present • The Future

  3. New Stuff The Database Community Motivation • Database research is driven by new applications, technology trends, new synergies with related fields, and innovation within the field itself.

  4. Evolution of Data Management • Cons: • The transaction errors cannot be detected on time • The business did not know the current state • Con: • Navigational programming interfaces are too low-level • Need to use very primitive and procedural database operations 1950: Univac had developed a magnetic tape 1951: Univac I delivered to the US Census Bureau • Programmed Record Managers • Birth of high-level programming languages • Batch processing • On-line Network Databases • Indexed sequential records • Data independence • Concurrent Access Manual Record Managers Punched-Card Record Managers 1900 1955 1965 -1980

  5. Evolution of Data Management (Contd.) • E.F. Codd outlined the relational model • Give Database users high-level set-oriented data access operations • Relational Databases && Client-Server Computing • Uniform representation • 1985: first standardized of SQL • Unexpected benefit • Client-Server • Because of SQL, ODBC • Parallel processing • Relational operators naturally support pipeline and partition parallelism • Graphical User Interface • Easy to render a relation • Oracle, Informix, Ingres • Multimedia Databases • Richer data types • OO databases • Unifying procedures and data • (Universal Server) • Projects that push the limits • NASA EOS/DIS projects 1970 1980 1995 2000

  6. Research Self Assessment • A group of senior database researchers gathers every few years to access the state of database research and point out some potential research problems • Laguna Beach, Calif. in 1989 • Palo Alto, Calif. in 1990 and 1995 • Cambridge, Mass. in 1996 • Asilomar, Calif. in 1998 • Lowell, Mass. in 2003 • The sixth ad-hoc meeting • Last for two days • 25 senior database researchers • Output: the Lowell database research self assessment report • More information: http://research.microsoft.com/~gray/lowell/

  7. Attendees • Serge Abiteboul, Martin Kersten, Rakesh Agrawal, Michael Pazzani, Phil Bernstein, Mike Lesk, Mike Carey, David Maier, Stefano Ceri, Jeff Naughton, Bruce Croft, Hans Schek, David DeWitt, Timos Sellis, Mike Franklin, Avi Silberschatz, Hector Garcia Molina, Rick Snodgrass, Dieter Gawlick, Mike Stonebraker, Jim Gray, Jeff Ullman, Laura Haas, Gerhard Weikum, Alon Halevy , Jennifer Widom, Joe Hellerstein, Stan Zadonik, Yannis Ioannidis Photos captured from http://www.research.microsoft.com/~gray/lowell/Photos.htm

  8. The Main Driving Forces • The focus of database research • Information storage, organization, management, and access • The main driving forces • Internet • Particularly by enabling “cross enterprise” applications • Require stronger facilities for security and information integration • Sciences • Generate large and complex data sets • Need support for information integration, managing the pipeline of data product produced by data analysis, storing and querying “ordered” data, and integrating with the world-wide data grid

  9. The Main Driving Forces (Contd.) • Traditional DBMS topics • Technology keeps changing the rules  reassessment • E.g.: The ratios of capacity/bandwidths change  reassess storage management and query-processing algorithms • E.g., data-mining technology  DB component, NLP querying • Maturation of related technologies, for example: • Data mining technology  DB component • Information retrieval  integrate with DB search techniques • Reasoning with uncertainty  fuzzy data

  10. Next Generation Infrastructure • Discuss the various infrastructure components that require new solutions or are novel in some other way • Integration of Text, Data, Code and Streams • Information Fusion • Sensor Data and Sensor Networks • Multimedia Queries • Reasoning about Uncertain Data • Personalization • Data Mining • Self Adaptation • Privacy • Trustworthy Systems • New User Interfaces • One-Hundred-Year Storage • Query Optimization

  11. Integration of Text, Data, Code and Streams • Rethink basic DBMS architecture supporting: • Structured data  traditional DBMS • Text  information retrieval • Space and time  spatial and temporal DB • image and multimedia data  image retrieval/multimedia DB • Procedural data  user-defined functions • Triggers  make facilities scalable • Data streams and queues  Data stream management

  12. Integration of Text, Data, Code and Streams • Rethink basic DBMS architecture supporting: • Structured data  traditional DBMS • Text  information retrieval • Space and time  spatial and temporal DB • image and multimedia data  image retrieval/multimedia DB • Procedural data  user-defined functions • Triggers  make facilities scalable • Data streams and queues  Data stream management • Start with a clean sheet of paper • SQL, XML Schema, XQuery • Too complex • Venders will pursue the extend-XML/SQL strategies • Research community should explore a reconceptualization

  13. The typical approach Because of Internet Millions of information sources Some data can only be accessed at query time Perform information integration on-the-fly Need semantic-heterogeneity solution Work with the “Semantic Web” people Other challenges Security policy: Information in each database is not free Probabilistic world of evidence accumulation Web-scale Extract-transform-load tool (ETL) Data Warehouse Information Fusion

  14. Sensor Data and Sensor Networks • Characteristics • Draw more power when communicating than when computing • Rapidly changing configurations • Might not completely calibrated

  15. Multimedia Queries • Challenges • Create easy ways to: • Analyze • Summarize • Search • View • Require better facilities for managing multimedia information

  16. Reasoning about Uncertain Data • Traditional DBMS have no facilities for either approximate data or imprecise queries • (Almost) all data are uncertain or imprecise • DBMSs need built-in support for data imprecision • The “lineage” of the data must be tracked • Query processing must move to a stochastic one • The query answers will get better • The system should characterize the accuracy offered

  17. Personalization • Query answers should depend on the user • Relevance feedback should also depend on the person and the context • A framework for including and exploiting appropriate metadata for personalization is needed • Need to verify the information systems is producing a “correct” answer

  18. Data Mining • Focus on efficient ways to discover models of existing data sets • Developed algorithms are: classification, clustering, association-rule discovery, summarization…etc. • Challenges: • Data-mining research to develop algorithms for seeking unexpected “ pearls of wisdom” • Integrate data mining with querying, optimization, and other database facilities such as triggers

  19. Self Adaptation • Modern DBMSs are more complex • Must understand disk partitioning, parallel query execution, thread pools, and user-defined data types • Shortage of competent database administrators • Goals • Perform tuning using a combination of a rule-based system, a database of knob settings, and configuration data • No knobs: all tuning decision are made automatically • Need user behaviors and workloads • Recognize internal malfunctions, identify data corruption, detect application failures, and do something about them

  20. Privacy • Security systems • Revitalize data-oriented security research • Specify the purpose of the data request • Access decisions should be based on • Who is requesting the data • To what use it will be put

  21. Trustworthy Systems • Trustworthy systems • Safely store data • Protect data from unauthorized disclosure • Protect data from loss • Make it always available to authorized users • Ensure the correctness of query results and data-intensive computations • Digital rights management • Protect intellectual property rights • Allow private conversation

  22. New User Interfaces • How best to render data visually? • During the 1980’s, we have QBE, VisiCalc • Since then, nothing…. • Need new better ideas in this area • Query languages • SQL and XQuery are not for end users • Possible choices? • Keyword-based query  Information-Retrieval community • Browsing  increasingly popular • Ontology + speech on NL  semantic Web +NLP

  23. One-Hundred-Year Storage • Archived information is disappearing • Capture on a deteriorating medium • Capture on a medium requiring obsolete devices • Application can interpret the information no longer works • A DBMS system can • Content remains accessible in a useful form • Automate the process of migrating content between formats • Maintain he hardware and software that each document needs • Manage the metadata long with the stored document

  24. Query Optimization • Optimization of information integrators • For semi-structured query languages, e.g., XQuery • For stream processors • For sensor network • Inter-Query optimization involving large numbers of queries

  25. Next Steps • A test bed from Information-integration research • Revisit the solved problems  Sea changes • Avoid drawing too narrow a box around what we do  Explore opportunities for combining database and related technologies

  26. Thank You. Any Question?

  27. Reference • Jim Gray. "Evolution of Data Management." Computer v29 n10 (October 1996):38-46. • http://www.research.microsoft.com/~gray/lowell/

More Related