340 likes | 356 Views
Umbrella 0.5 Project Report #1. Heterogeneous Data Source Access Using Standard SQL Queries Eric Matson – MSE Kansas State University. Agenda. Overview Methods & Tools Requirements Project Management and Control GUI Walkthrough/Prototyping Summary. Overview. Purpose Goals
E N D
Umbrella 0.5Project Report #1 Heterogeneous Data Source Access Using Standard SQL Queries Eric Matson – MSE Kansas State University
Agenda • Overview • Methods & Tools • Requirements • Project Management and Control • GUI Walkthrough/Prototyping • Summary
Overview • Purpose • Goals • Description • Architecture • Key Elements • Process • Test Lab • Scope • Future Directions • Literature Search Results
Purpose • Query heterogeneous data sources using standard SQL statements • Achieve database integration and access at the query level • Allow the data to be “black box” whereas users don’t need to have understanding to source, architecture, or technical details
Goals • Complete the Requirements of CIS895 • Successfully Navigate Software Engineering Process • Prove Umbrella Concept • Prove Ability to Process Umbrella Queries • Split the query into to process heterogeneous sources • Manage the connection processing with the data sources • Return and create new relation(s) • Create easy to use Graphical User Interface (GUI)
Description • Enter queries using standard SQL statements • Query heterogeneous data sources • Text Files • Formatted • Delimited • Relational Databases • PostgreSQL • MySQL • Informix • Knowledge Base Formats • Prolog Rules • CLIPS Deftemplate facts • Non-relational Databases • ODBC Data Sources • Unify Data together as single relation
Description – Architecture Relational Database Parser D a t a C o n n e c t o r s Object Database API Splitter TCP/IP Text File Router Umbrella GUI Unifier KB File Profile Data Server Data Sources Engine
Description – Key Elements • GUI: User Interface • Umbrella Engine Agent • Parser: Checks the query for correctness and validity • Splitter: Splits query to appropriate data sources • Router: Manages routing and connection to data source • Unifier: Unites data upon return • Profile: Database to manage data source information • Data Connector: Interface to data sources • Data Server: • Text: “Wraps” data and serves the Umbrella Engine • KB: “Wraps” fact and serves the Umbrella Engine
Description - Process SELECT * FROM Job, Name, Addr WHERE Job.SSN = Name.SSN AND Job.SSN = Addr.SSN SSN Title Salary Job Parse SELECT * FROM Job SSN Title Salary First Middle Last Street State ZIP Split SSN First Middle Last Unify SELECT * FROM Name DC Name DC Route DC SSN Street State ZIP SELECT * FROM Addr Addr
Description - Test Lab Server Client Network PostgreSQL Linksys 10 Mb Hub Windows 98 MySQL Red Hat 7.1 192.168.1.2 192.168.1.1 Text
Scope • Functionality • Umbrella 0.5 Query Engine • Graphical User Interface (GUI) • Data Servers for Text Data Sources • Data Connectors • SQL Select Statement in Limited Format • Data Sources • PostgreSQL: ANSI Compliant ORDBMS • MySQL: non-ANSI Compliant RDBMS • Formatted Text • Delimited Text
Future Directions • Version 0.6 – SQL release • Implement full grammar support for SQL • Enhance dataset join algorithm • Enhance GUI • Add ODBC, Berkeley DB, Interbase and ADABAS data connectors • Version 0.7 – KD/AI release • Add Lisp, Prolog, CLIPS data connectors • Enhance Query Efficiency Algorithms • Version 0.8 – Integration release • Add API layer for access by C/C++, Java, VB • Add tools for advanced searches and Data Mining • Add Security Module • Version 0.9 – Data Connector release • Add Oracle, Informix, UDB, Sybase data connectors • Add Object Database data connectors • Add XML data connector • Version 1.0 – Mainframe release • Add IMS, DB2, VSAM data connectors
Literature Search • Looked at US/Canadian Research Units • Companies • Universities • Similar Research/Product Development • WWW.Jibe.com - Only JDBC/ODBC sources • Queens University(Canada)/IBM – Older pre-Internet “Explosion” • Stanford Mariposa Project • Summary • Some Similar Projects • Most are Tangential in Scope and Function
Methods & Tools (M&T) • Development Process • Object Modeling • Formalism/Constraints • Development Tools • Infrastructure
M&T – Development ProcessCOMET(Concurrent Object Modeling and architectural design mEThod – Hassan Gomaa) Planning Delivery Non-COMET Process COMET Process Requirements Modeling Analysis Modeling Design Modeling Incremental Software Construction Throwaway Prototyping Incremental Software Integration Systems Testing Incremental Prototyping Phase 1 Phase 2 Phase 3
M&T – Object Modeling • Unified Modeling Language (UML) used • UML is standard OO modeling language • Well understood • UML Tool • Poseidon • ArgoUML • Both are very similar, Poseidon is an extension of ArgoUML
M&T – Formalism/Constraints • Object Constraint Language (OCL) • Easier to understand • Less mathematical • Less developed • USE • Graphical tool to support OCL
M&T – Development Tools • Sun Java 1.3 JDK/JRE • Newest release of development kit • Write Once, Run Anywhere (Portable) • Forte (Netbeans) Integrated Development Environment • Community Edition (CE) • Easy to use • Free!
M&T - Infrastructure Server OS: Red Hat Linux 7.1 Object/Relational Database Server: PostgreSQL Relational Database Server: MySQL
Requirements • Object Diagram • Class Descriptions • SQL Grammar • SQL Example
Requirements – Object Diagram DataConn Parser Engine RelDataConn TextDataConn Query Splitter JDBCConn FormDataConn DelDataConn SubQuery Interagator MySQLConn PostgresConn Profile Router JDBCProfile TextProfile TextDesc Unifier FormTextProf DelTextProf ResultSet DataServer
Requirements – Class Descriptions • Engine: Control agent of Umbrella architecture • Parser: Checks query for grammar/valid data objects • Splitter: Splits queries into valid sub-queries • Router: Routes queries to data source/manages connection • Unifier: Unifies sub-query result sets into single relation • Query: SQL statement • SubQuery: Created by Splitter for specific data source • ResultSet: Data returned from sub-query of data source • TextDesc: Defines characteristics of text data source • Interrogator: Queries a new data source for information on field definitions • DataServer: Server that supports access to text data sources
Requirements – Class Descriptions • Profile: Defines characteristics of a data source • JDBCProfile • FormattedTextProfile • DelimitedTextProfile • DataConnector: Defines specifics to query a data source • RelationalDataConnector • JDBCDataConnector • MySQLDataConnector • PostgreSQLDataConnector • TextDataConnector • FormattedTextDataConnector • DelimitedTextDataConnector
Requirements – SQL Grammar • <statement> ::= SELECT <select> FROM <table>.| SELECT <select> FROM <table> WHERE <condition>. • <select> ::= <field> | <select>,<field> | * • <field> ::= <fieldname> | <tablename>.<fieldname> • <fieldname> ::= <alpha> {<alpha>}{<digit>} • <table> ::= <tablename> | <tablename> {,<tablename>} • <tablename> ::= <alpha> {<alpha>}{<digit>} • <conditionals> ::= <relation> {AND <relation>}{OR <relation>} • <relation> ::= <field> <operand> <rightside> • <operand> ::= < | > | = • <rightside> ::= <field> | <vartype> • <vartype> ::= <string> | <integer> | <real> • <string> ::= “” | “<alpha> {<alpha>}{<digit>} “ • <integer> ::= <digit> {<digit>} • <real> ::= <integer> . <integer> • <alpha> ::= a|b|c…x|y|z|A|B|C…X|Y|Z • <digit> ::= 0|1|2|3|4|5|6|7|8|9
Requirements – SQL Example • SELECT * FROM address; • SELECT ssn, age FROM employee; • SELECT lastname, firstname FROM employee WHERE age > 40; • SELECT lastname, firstname, salary FROM employee, hr WHERE employee.ssn = hr.ssn AND employee.age > 25;
Project Planning and Control • Project Planning • Work Breakdown Structure (WBS) • Gantt Chart • Description of Estimation Techniques • Project Control • Effort/Budget • Project Log and Time Tracking • Change Management (Feedback Loops)
Project Planning - WBS 1 Umbrella Goals: 1.1 Planning 1.1.1 Define Purpose • Tasks: .25 < hours < 8 1.2 Requirements Modeling 1.1.2 Define Scope Outcome: 1.3 Throwaway Prototyping 1.1.3 Define Purpose • Stay on schedule 1.4 Analysis Modeling 1.1.4 Define Goals • Prevent “creep” 1.5 Design Modeling … 1.6 Incr. SW Construction 1.7 Incr. SW Integration 1.8 Incr. Prototyping 1.9 Systems Testing 1.10 Delivery
Project Planning – Gantt Chart September October November December January Planning Req’s Modeling Throwaway Prototyping Analysis Mod. Design Mod. Incr. SW Const Incr. SW Integ. Incr. Prototyping Systems Testing Delivery Review #1 Review #2 Review #3
Project Planning – Estimation Techniques • Estimate Tasks on Defined WBS • All tasks broken into “chunks” of work between .25 and 8 hours • Keep project on schedule • WBS Estimation Used • Small Project Team • Well Understood Capabilities of Project Member(s) • Change Budget Allocated at 20% of Initial Estimate • Works with Iterative Schemes • Change is Planned and Accepted
Project Control – Effort/Budget • Effort • Effort Data Hours Man/Mo (168 hr/mo) • Predicted 565 3.36 • Current 183.25 1.09 • Budget • Time ($50/hr) $28,250 • Current $9,162.50
Project Control – Time Log/Issues • Time Log • Track time spent each day on project tasks • WBS Number • Hours • Viewed on Engineering Notebook Site • Issues • As design issues are realized • Issues Documented • Solutions Arrived at and Documented • Folded in to Change Management Process • If Impact and In Scope
Project Control – Change Mgmt. • Development Process (COMET) Oriented for Change • Feedback changes will be tracked • Change Control Number • Task Name • Added Hours to Project Schedule • Description of Change • WBS Number of Artifacts Altered • Dependencies
Graphical User Interface File Connection Query Help Connections --- 192.168.1.2 ---- /opt/data ----formatted.txt ---- PostgreSQL Server ----Employee ----Salary --- 192.168.1.1 ---- MySQL Server ----Items ----Invoice ----Customers ---- /var/lib/data ----delimited.txt Queries --- Employees and Customers --- Employee ---- ssn, lastname --- Customers ---- ssn Define Query Name: Employees and Customers Query: SELECT ssn, lastname FRO 111-22-3333 Elvis Presley 222-33-4444 Richard Nixon 333-44-5555 Mikhail Gorbechev Results Status: Subquery1: OK Subquery2: Failed
Summary • Umbrella accesses heterogeneous data sources • Defined set of tools and methodologies • High Level Requirements Defined • Project Management Well Defined • Easy to Use GUI • Next Steps • Questions • Feedback • Signoff • Proceed to Phase #2