270 likes | 525 Views
Getting Started Writing a Thesis/Dissertation. Dr. Karen C. Davis Electrical & Computer Engineering Dept. Graduation. ECES 877 Advanced Data Models and Query Optimization. query optimization logical physical advanced data models object-relational data warehouse XML. Spring 2007
E N D
Getting Started Writing aThesis/Dissertation Dr. Karen C. Davis Electrical & Computer Engineering Dept.
ECES 877 Advanced Data Modelsand Query Optimization • query optimization • logical • physical • advanced data models • object-relational • data warehouse • XML Spring 2007 coming to a classroom near you!
Relational Algebra Query Trees Sujan Turlapaty’s thesis defense: Performance Analysis of Self-Maintainable Data Warehousing Algorithms, 11/99
view chromosome: 101100010100001 index chromosome: 1100110 Fitness: sum of query processing costs of individual queries using the views and indexes selected Q2 Q3 Q1 πO.orderkey, O.shippriority (v9) πC.custkey, C.name, C.acctbal, N.name, C.address, C.phone (v12) πP.type, L.extendedprice (v15) σ C.mktsegment = “building” and L.shipdate = “1995-03-15” (v8) σ O.orderdate = “1994-10-01” (v11) σ L.shipdate = “1995-09-01” (v14) ⋈nationkey (v10) ⋈orderkey (v7) ⋈custkey (v6) ⋈partkey (v13) πname, address, phone, acctbal, nationkey, custkey, mktsegment (v1) πorderkey, orderdate, custkey, shippriority (v2) πpartkey, orderkey, shipdate, extendedprice (v3) πnationkey, name (v4) πpartkey, type(v5) Customer (C) Orders (O) Lineitem (L) Nation (N) Part (P) Multiple View Processing Plan (MVPP) thesis defense of Sirisha Machiraju: Space Allocation for Materialized Views and Indexes Using Genetic Algorithms, June 2002
BH System Architecture Michael Brant, Binding Hash Technique for XML Query Optimization, 2006
My Students Ph.D. (2) Satish Venkatesan, 1996, Database Modeling for Electronic Design Automation Environments, awarded ECECS Outstanding Dissertation Award, 1996. Yunsong Zhan, XML-based Data Integration for Application Interoperability, 2002. M.S. (24) Lun Ye, A Compiler Cooperative Dynamic Memory Management System for C++, 1993. Ron Meade, EasyOpt: A Design Optimization Interface Package, 1994. Rao Seshagiri Kasinadhuni, Design and Performance Issues of Client-Server DBMS Architectures, 1994. Samir Nigam, Transformation-based Semantic Query Optimization for Object-Oriented Databases, 1994. Baskaran Dharmarajan, The Property Map: A Theoretical Foundation and Query Optimization Algorithms, 1997. Mala Rajamani, Reduction and Maintenance of Self-maintainable Views for Data Warehousing, 1997. Veena Pandiri, A Global Framework for Distributed Agent-based Systems, 1997. Radha Ganapathy, Selection of Self-Maintainable Views to Materialize in a Data Warehouse, 1998. Vishal Sheth, Extended Property Maps: An Efficient Access Mechanism for Retrieval from Large Data Sets, 1998. Gayathri Krishnan, Physical Schema Design for Object Databases, 1998. Shobha Ravishankar, Object-Oriented Index Selection and Integration, 1998. Ji Qin, Access Plan Generation for Property Maps and Multidimensional Indexes, 1999. Sujan Turlapaty, Performance Analysis of Self-Maintainable Data Warehousing Algorithms, 1999. Unmi Tina Kang, Path Inherited Dictionary Index (PIDI): An Integrated Object-Oriented Database Index, 2000. Jennifer Grommon-Litton, Heuristic Design Algorithms and Evaluation Methods for Property Maps, 2000. Rajeswari Malladi, Applying Multiple Query Optimization in Mobile Databases, 2001. Xioaming Du, Dynamic Channel and Broadcast Disk Organization in Mobile Databases, 2001. Krishnamoorthy Janakiraman, Entity Identification Using Data Mining Techniques, 2001. Casie Phipps, Migrating an Operational Database Schema to Data Warehouse Schemas, 2002. Ashima Gupta, Performance Comparison of Property Map Indexing and Bitmap Indexing for Data Warehousing, 2002. Sirisha Machiraju, Space Allocation for Materialized Views and Indexes Using Genetic Algorithms, 2002. Ravi Darira, A Design Framework for Property Maps, 2006. Micheale Brant, Binding Hash Technique for XML Query Optimization, 2006. Janet Rajan, A Framework for Medical Acronym Disambiguation, 2007.
Thesis/Dissertation Organization title page, abstract, dedication, table of contents, list of figures, list of tables • Introduction • Related Research • Foundations • Results (may be several chapters) • Conclusions and Future Work • Appendices
Introduction • introduce the general topic area • narrow the focus to specific topic • motivate the research • why is it needed? • who will benefit from the research? • conclude with a clear statement of the problem • give a statement of the work • provide an overview of the thesis (one sentence per chapter)
conventional database systems are increasingly leveraged for organizational decision-making analysis systems are different than conventional operational systems because … … because of these differences, designing a data warehouse has challenges … this thesis addresses specific phases of design Sample Introduction
Research Objectives • general research objective: one sentence describing what you hope to accomplish (not how!)
Parallel Sections: Statement of the Work • specific research objectives: partition the general objective into sub-goals • research plan/methodology/tasks/approach: revisit the objectives • your approach to solving the problem • each objective has an associated task or approach to satisfy the objective • expected contributions: revisit the methodology • what will you know or have when you’ve done the task? • potential impact of your work
expected contributions describe what will be accomplished by executing the methodology specific research objectives accomplish the general research objective methodology defines approach to accomplishing the objectives Sample Parallel Sections
Related Research • focused around your topic; not a tutorial! • compare/contrast to your approach • tables with features/research efforts are concise, readable way to summarize
Foundations • work you build on (your own or someone else’s) • definitions, theorems, models, system
Research Discuss conventions, setup, hypotheses of experiments, proofs • why did you do it? • what did you learn from it? Presenting • figures • algorithms • tables • graphs Sample! Don’t do a dump of everything … put everything in appendices and discuss representative results in the body of the thesis or dissertation
PMap Creator PMap storage and performance measurement simulator REBSI storage and performance measurement simulator Example Experiment Setup Goals • What are the comparative storage and retrieval cost of REBSI and PMaps in different scenarios? • How is individual and relative performance affected by parameters such as blocksize, database size, selectivity of queries and cardinality of attributes, kind of queries, property ordering? • Can PMaps design and performance be improved using this knowledge? • In what conditions is it better to use either index? Query Set PMap PMap Performance [1…6] properties and Storage Cost Word Size (ws) (pu, pstring) {16, 32} Tuple size (t) {1,000,000, 50,000} Blocksize (SB) {2048, 4096, 8192}REBSI Performance and Storage Cost Scaling Factor (sf) {min, …, 10}
Queries are ordered by the difference of REBSI with min_sf and PAvg. • Observations: • REBSI performance improves as the sf becomes larger. • REBSI performance improves as cardinality becomes smaller. • PAvg performance deteriorates as cardinality becomes smaller. • PAvg is better than REBSI min_sf (4) for all queries. • PMin << pages retrieved by any REBSI. • PMap retrieves fewer pages for multi-attribute queries than single attribute queries. Example Presentation of Results • number figures (e.g., Figure 3.2) • refer to the figures in text “In Figure 3.2, results for the HCAQS are shown.” “Figure 3.2 shows HCAQS results.” • explain the conventions “The x-axis shows individual queries and the y-axis shows index pages retrieved. The queries are ordered by decreasing cardinality.” • offer observations to help the reader see what is important or interesting “REBSI performance improves as the cardinality decreases.” • discuss possible reasons for the observed results • give general conclusions
Conclusions and Future Work • revisit objectives • what was accomplished? • what was learned? • topics for future work • extensions • open questions
Conclusions: • BH method work well for deeply nested queries with few branches (non-bushy) • BH Indexing technique requires further optimization • BindingCollection is a flexible data structure • Can be used in to generate witness trees for processing embedded Xpath expressions • Used to process Xpath expression directly • Can use a different indexing schemes Future Work: • Modify indexing technique to increase performance and perform inequality matching • Expand Post-order Traversal to support more TAX pattern tree features: e.g., value-based joins • Expand more extensive performance study
Citations • allow the reader to follow up on the topic • fill in background information • judge what you’ve said by reading original sources • relieve you of the burden of going over all territory on a subject • strengthen/justify your point • respect your peers by acknowledging their contributions [vL78]
Citations • not a part of speech! • not a part of speech! • not a part of speech! #11: never, ever, use a bracketed number as if it were the name of an author or a work [vL78]: • BAD: “In [23], algorithms are presented …” • GOOD: “Jones presents algorithms … [23].”
Writing Style • avoid vague words (e.g., “deals with,” “handles”) • avoid contractions • be consistent in spelling, punctuation, capitalization style • use the same grammatical style for items in a list • develop flow/transitions between paragraphs, sections, chapters • avoid empty sections • merge/eliminate single item sublists or subsections • place punctuation inside quotes • avoid second person (“you …”) • try to write in only one verb tense, preferably the present tense • use “including …” instead of “etc.” • use “such as” instead of “like” • put math in definitions, theorems, proofs; explain in English to build the reader’s intuition • Use “:” instead of “-” in technical writing • Space after “)” and “:” (not before!) • Use “that” instead of “which” when not counting things
References [s99] Strunk, The Elements of Style, New York: bartleby.com, 1999, http://www.bartleby.com/141. [vL78] van Leunen, M.-C., A Handbook for Scholars, Alfred A. Knopf, 1978.
Current Research Work • Sandipto Banerjee, Ph.D. • Bartley Richardson, Ph.D. • Lydia Fitzgerald, M.S. • Bill Nicholson, M.S./Ph.D