1 / 15

Readings in Data Management Spring 2008

Readings in Data Management Spring 2008. Computer Science Department Rutgers University. Seminar Information. Web page: http://www.cs.rutgers.edu/~amelie/courses/dbseminar.html Meets Thursday 1-2:30pm in CoRE A. Organization. Weekly presentation on a DB topic (30 minutes)

reya
Download Presentation

Readings in Data Management Spring 2008

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Readings in Data ManagementSpring 2008 Computer Science Department Rutgers University

  2. Seminar Information • Web page: http://www.cs.rutgers.edu/~amelie/courses/dbseminar.html • Meets Thursday 1-2:30pm in CoRE A

  3. Organization • Weekly presentation on a DB topic (30 minutes) • We will select 2-3 topics to focus on the course of the semester • For each topic • First week: overview paper (survey, influential work) • Subsequent weeks: more complex papers on the subject • Possibly a few external presentations such as: • Students preparing for DB conference talks or quals • Invited speakers • Discussion on the paper

  4. Topics • First Topic:Probabilistic Databases • We will select next topics from (non exhaustive list): • Question answering • Web Search • Personal Information Spaces • Query Optimization • Data Cleaning • Data Integration • Data Mining • Query Processing Techniques • Adaptive, Automatic, Autonomic Systems • OLAP • Stream Aggregation • Storage, Indexing, and System Architecture • XML Processing • Preference functions • Spatial and High-Dimensional Data • Recovery • Privacy in DBMS • …

  5. What I expect from you • 1-2 presentation over the course of the semester • First-year students will be given “overview” presentation assignments at the beginning of each topic • More Senior students will present more research-focused papers • Number of presentations depends on the number of students in the seminar • Everyone should read the paper in advance and prepare 1-2 questions/discussion topics • Participation in discussion • There are no “stupid” questions! If you did not understand something, chances are others did not either

  6. Presentations • I will select a list of papers to present for each topic • Start with an introductory paper • The papers that go deeper into one or more aspect of the problem • You are welcome to suggest some papers on the topic, as long as it is related (so that we can have more meaningful discussions) • Papers that I have overlooked • Papers on a different aspect of the topic that you would like to focus on

  7. First topic: Probabilistic Databases • Uncertainty/Imprecision in data • Query Semantics • Probabilistic Data Representation Next few slides from Dan Suciu’s tutorial, more at

  8. Databases Today are Deterministic • An item either is in the database or is not • A tuple either is in the query answer or is not • This applies to all variety of data models: • Relational, E/R, NF2, hierarchical, XML, …

  9. What is a Probabilistic Database ? • “An item belongs to the database” is a probabilistic event • “A tuple is an answer to the query” is a probabilistic event • Can be extended to all data models;

  10. Two Types of Probabilistic Data • Database is deterministicQuery answers are probabilistic • Database is probabilisticQuery answers are probabilistic

  11. Long History Probabilistic relational databases have been studied from the late 80’s until today: • Cavallo&Pitarelli:1987 • Barbara,Garcia-Molina, Porter:1992 • Lakshmanan,Leone,Ross&Subrahmanian:1997 • Fuhr&Roellke:1997 • Dalvi&S:2004 • Widom:2005

  12. So, Why Now ? Application pull: • The need to manage imprecisions in data Technology push: • Advances in query processing techniques

  13. Application Pull Need to manage imprecisions in data • Many types: non-matching data values, imprecise queries, inconsistent data, misaligned schemas, etc, etc The quest to manage imprecisions = major driving force in the database community • Ultimate cause for many research areas: data mining, semistructured data, schema matching, nearest neighbor

  14. Technology Push Processing probabilistic data is fundamentally more complex than other data models • Some previous approaches sidestepped complexity There exists a rich collection of powerful, non-trivial techniques and results, some old, some very recent, that could lead to practical management techniques for probabilistic databases.

  15. Suggested Papers to discuss • Nilesh Dalvi, Dan Suciu: Efficient Query Evaluation on Probabilistic Databases. (VLDB 2004). • Minos Garofalakis et al, Probabilistic Data Management for Pervasive Computing: The Data Furnace Project. IEEE Data Eng. Bull. 29(1)(2006) • Omar Benjelloun, Anish Das Sarma, Chris Hayworth, Jennifer Widom: An Introduction to ULDBs and the Trio System. IEEE Data Eng. Bull. 29(1)(2006) • Prithviraj Sen, Amol Deshpande, Representing and Querying Correlated Tuples in Probabilistic Databases (ICDE 2007)

More Related