1 / 13

Welcome to CPSC 534B: Information Integration

Welcome to CPSC 534B: Information Integration. Laks V.S. Lakshmanan laks@cs.ubc.ca Rm. 315. Course Objectives. Most applications of information technology require effective and efficient management of information. Information may reside anywhere – not just in DBs.

dima
Download Presentation

Welcome to CPSC 534B: Information Integration

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Welcome to CPSC 534B: Information Integration Laks V.S. Lakshmanan laks@cs.ubc.ca Rm. 315

  2. Course Objectives • Most applications of information technology require effective and efficient management of information. • Information may reside anywhere – not just in DBs. • Information can be heterogeneous. • Information of interest may not all be in one place. • Information Integration. • II enabler for a whole class of new applications.

  3. Course Objectives (contd.) • Key technologies: • RDBMS • Heterogeneous database systems • View integration and management • Semistructured data and XML (data on the web) • Main goal: learn about key concepts, techniques, algorithms, languages, and abstractions that make II possible. And have some fun.

  4. Tentative Schedule Basic Tools (GOFDB) • Week of Jan. 5: Overview/review of FOL. • Jan. 12: Review of Relational algebra, calculus, datalog, SQL, integrity constraints. • Jan. 19: Query containment and equivalence. • Conjunctive • Negation & aggregation

  5. Tentative Schedule Integration Take 1 – Global Info. Systems • Jan. 26: Integration models – Global As View and Local As View query answering using views (an application) II Take 2 – Dealing with heterogeneity • Feb. 2: SchemaLog and SchemaSQL. • Feb. 9: Schema Integration & Matching. • Feb. 16: Break!

  6. Tentative Schedule (contd.) II Take 3 – Dropping (rigid) structure • Feb. 23: Intro to Semistructured data and XML (data model) • XPath & Tree Pattern Queries • Mar. 1: XPath (contd.) XQuery. • Mar. 8: XQuery (contd.) TAX algebra / structural Join algos • Mar. 15: XML Storage • Native • Relational • Mar. 22: XML + Information Retrieval

  7. Tentative Schedule (contd.) II Take 4 – Semantic Web (The final frontier?) • Mar. 29: Semantic Web and II • Project Talks and demos: April 5 onward.

  8. Marking Scheme • Assignments 45% • Project 55% • Reading papers • Critiquing them • Innovating • Implementing • Reporting and presenting • Projects can involve teams of 2-3 people (subject to approval). • Each team to include  1 MCS student.

  9. Suggested Project Themes • Ideas/suggestions offered throughout the course, so be attentive! • Data cleaning: key step required in data integration. • Mining DTD/schema for XML docs: what you do when you must deal with XML data with no accompanying DTD/schema. • XML schema integration: different XML data sources may follow different DTD/schemas. How do you provide a unified integrated view to the user?

  10. Project Themes (contd.) • XML query containment/equivalence: given queries (in XQuery or XPath), can rewrite them into more efficient ones; possibly use DTDs or integrity constraints. • XML query operator evaluation algorithms: develop cost models and cost-based physical optimization strategies. • XML and data security: how do you ensure queries are evaluated securely? Do not divulge anything you are not supposed to.

  11. Project Themes (contd.) • XML and Information Retrieval: effective way of querying documents marked up using XML (e.g., Shakespear’s plays); how do you combine IR and database-style XML querying? • Data integration issues for biology: scientific data tends to be heterogeneous. How to meet the data integration challenges there? • Query Answering using Views for XML: Extend the QAV technology developed for RDBMS for XML querying.

  12. Project Themes (contd.) • Detecting similarity between XML documents: develop notions of similarity between XML docs and implement algorithm(s) for detecting similarity • Ranking answers to keyword search queries over XML data: develop and implement algorithms for ranking answers, based on “quality” of match • XML interop: leverage semantic web and ontologies for matching schemas (XML or relational) and develop/implement algorithms for answering cross-queries

  13. Project Themes (contd.) • Explore higher-order logics for tree (XML) querying: example candidates are HiLog and (extensions of) SchemaLog. [can be purely conceptual or part conceptual and part implementation.]

More Related