1.74k likes | 1.87k Views
Purdue-Tivoli Partnership: Exploiting Purdue’s Technological Prowess. Walid G. Aref Associate Professor, CS. Dept. XML Databases and Data Mining. Database Systems Research. The objective of my research is: Build efficient database engines for new data types:
E N D
Purdue-Tivoli Partnership:Exploiting Purdue’s Technological Prowess Walid G. Aref Associate Professor, CS. Dept. XML Databases and Data Mining Purdue-Tivoli Partnership
Database Systems Research • The objective of my research is: • Build efficient database engines for new data types: • spatial/geographical databases • semi-structured/unstructured web databases • multimedia databases • Develop algorithms to answer new database query types: • data mining algorithms • spatial query processing algorithms Purdue-Tivoli Partnership
Proposed Projects • Building an XML-based Database System Prototype • Data Mining of Event Traces in a Distributed System Purdue-Tivoli Partnership
XML &Web Databases • XML as the Internet uniform language for information interchange • Target: • Seamless integration of databases/non-information sources • view/query the web as a huge database • Querying web structure and contents • personalize/adapt based on customer’s needs/patterns Purdue-Tivoli Partnership
XML/Web Databases: Phase 1 • Design data model for XML document database • Design algebra and query language for XML data model • Build prototype XML database engine • Design and prototype XML views for • legacy relational database systems • HTML pages Purdue-Tivoli Partnership
XML &Web Databases: Phase 2 • Develop indexing techniques in XML databases • Develop query processing and optimization techniques for XML queries • Maintenance of user profiles for personalized views and query answers • Some applications: • Web mining: Clustering web users based on access patterns • Web/site searching and mapping to an XML database • modeling distributed system’s topology and resources using XML Purdue-Tivoli Partnership
XML &Web Databases: Project Plan • Phase 1 deliverables (duration: 18 months) • prototype XML database system • search capability + views to relational db/HTML documents • End-of-Phase-1 project report + possible publications/patent filings • Phase 2 deliverables (duration: 18 months) • efficient prototype XML database system with indexing/query processing and optimization capabilities • web site mapping into an XML database and db/web query engine with user profiling and web mining capabilities • Final project report + possible publications/patent filings • Estimated dollar value: $150K Purdue-Tivoli Partnership
Data Mining of System Event Traces • Target: • Find common event sequences • detect irregular event patterns • predict future events and attempt to prevent them • Mining common event sequence patterns • Mining user actions and responses • Take into consideration system topology and structure Purdue-Tivoli Partnership
Data Mining of System Event Traces: Phase 1 • Develop understanding of the Tivoli suite of Enterprise Management Software • Get sample traces of events, user actions, along with the corresponding sample system/network topologies • Analyze the data and design the needed structures and schemas • Build a prototype data warehouse for event traces, actions, and topologies Purdue-Tivoli Partnership
Data Mining of System Event Traces: Phase 2 • Apply existing data mining techniques • Develop new algorithms for mining the traces given the system/network topologies • Developing incremental data mining techniques of event traces • Analysis of the data mining results • Iterate through the above process Purdue-Tivoli Partnership
Data Mining of System Event Traces: Project Plan • Phase 1 deliverables (duration: 18 months) • prototype data warehouse of distributed system event/action traces and system topologies • Phase-1 project report + possible publications/patent filings • Phase 2 deliverables (duration: 18 months) • prototype data warehouse with data mining capabilities • new incremental algorithms for mining traces considering topologies • sample study, results of mining the traces, and recommendations • Final project report + possible publications/patent filings • Estimated dollar value: $110K Purdue-Tivoli Partnership
Past Research Projects • Distributed Scalar Storage Servers • Using Network-Attached Storage Devices (NASD) • Design and prototype of a distributed real-time file system (multimedia server) • Past research with Panasonic/Matsushita • (1 European patent granted, 3 U.S. patent filings) • Several journal and conference publications • Data mining in time-series databases • in collaboration with IBM Almaden, San Jose • two patent filings Purdue-Tivoli Partnership
Ongoing Research Projects • Multimedia database systems (On-going) • indexing techniques • textual/caption annotation • retrieval by content • data mining of multimedia data • ACM SIGMOD 95, IEEE ICDE 95, 3 U.S. patents • Spatial database systems (On-going) • Prototype systems • Sand: Prototype spatial database system (University of Maryland, College Park, 1990) • Spatial index attachment in Starburst extensible DBMS (IBM Almaden Research, San Jose, CA, 1992) • Program committee member VLDB 2000 (Intl. Conf.) Purdue-Tivoli Partnership
Purdue-Tivoli Partnership:Exploiting Purdue’s Technological Prowess Elisa Bertino Professor Access control mechanisms for XML document sources Purdue-Tivoli Partnership
Long Term Objectives • The objective of my research is: • Development of tools supporting the specification of access control policies for XML documents • Development of access control mechanisms for XML documents • Development of automatic classification tools for XML documents Purdue-Tivoli Partnership
Impact • The system we develop will support the access control administration for heterogeneous sources of XML documents • The system will support a language for a high-level definition of access control policies • The system we develop will support import/export of XML documents among different sources • It will allows one enable selective distribution of documents to large user communities Purdue-Tivoli Partnership
Research Methodology • We plan to develop the access control system and the administration environment on top of a DBMS supporting XML Purdue-Tivoli Partnership
Research Plan • Definition of an access control model for XML documents • Definition of an XML library for the encoding of access control policies and authorizations • Integration of the Access Control Model with User Credentials mechanisms • Implementation of the Access Control Model • Secure Dissemination of XML Documents • Extension of the Access Control Model to deal with Multimedia Data • Development of Access Control Policies for Specific Applications (such as workflow systems) . Purdue-Tivoli Partnership
Past Research • Formal definition of an authorization model for XML documents. • Automatic classification of semi-structured and XML documents • Relevant papers: E.Bertino, et Al, “An Approach to Classify Semi-Structured Objects”. Proc. ECOOP 99. E.Bertino, et Al. “Controlled Access and Dissemination of XML Documents”. Submitted for publication. E.Bertino, et Al. “An Approach for the Specification and Enforcement of Authorization Constraints in Workflow Management Systems", ACM Trans. On Information and Systems Security, Vol.2, No.1, pp.65-104, February 1999. Purdue-Tivoli Partnership
Purdue-Tivoli Partnership:Exploiting Purdue’s Technological Prowess William J. McIver, Jr. Visiting Assistant Professor* Souk Nets: A Component-based Database Integration Paradigm Purdue-Tivoli Partnership
Long Term Objectives • The objectives of my research are to: • Develop a component-based paradigm tailored to database integration. • Design a language to use within this paradigm. • Develop a set of components for implementing database integration solutions. • Optimizecomponent container approaches for database applications. Purdue-Tivoli Partnership
Impact • Allow data integrators to leverage benefits of component-based software construction: • Allow prefabricated functionality to be reused. • Perform more robust reuse. • Perform modular checking in the face of evolution. • Produce a Component-based Paradigm tailored to data source integration: • Current component-based approaches (e.g. EJBs) are lacking in this domain. • Too low-level • Imperative & tedious • Current containers are inefficient for database access. Purdue-Tivoli Partnership
Impact • Enable better construction of database integration solutions: • Rapid • Robust • Reusable • Analyzable • Fault Tolerant • Evolvable • Enable the construction of more efficient component-based database applications. Purdue-Tivoli Partnership
Research Methodology • Conceptual • Identify canonical use cases for this technology. • Factor the domain of data integration solutions. • Federation & Schema Integration, Global Query Language Approaches, Point Solutions, etc. • Design a covering set of components to implement these solutions. • Explore use of reflection, contracts, design patterns, and meta-data approaches. • Build in reasoning capabilities for composition, changes to interfaces, fault situations. • Design a high-level, cross-platform language. Purdue-Tivoli Partnership
Research Methodology • Theoretical • Extend the LINDA notion of Tuple Space (Gelertner & Carriero) • Accomodate complex objects • Object Spaces • Employ WoFNets Semantics (Ellis & Keddara 1999) to interconnect Object Spaces. • A Variant of Colored Petri nets • Provides a formal semantics • Supports dynamic change • Applicable to handling evolution of requirements • Possible applicability to the active networks paradigm Purdue-Tivoli Partnership
Research Methodology • Theoretical (continued) • Transitions in Souk nets constitute components. • Structural and value transformations • Filters • Control flow • Event subscription, notification and handling • User defined transitions Purdue-Tivoli Partnership
Research Methodology • Experimental • Evaluate modeling capabilities of paradigm and language. • Employ selected use cases. • Conduct performance evaluations of run-time system. • Iterate on language design and system implementation. Purdue-Tivoli Partnership
Research Plan • Schedule & Milestones • Year 1 • Identify a canonical set of use cases for database integration. • Implement baseline prototypes of use cases for analysis. • Complete the first version of the component paradigm. • Year 2 • Implement an environment based on the component paradigm. • Begin an iterative evaluation process (through Year 3). • Year 3 • Implement a revised environment. Purdue-Tivoli Partnership
Research Plan • Deliverables • Software artifacts from each milestone • Results of each milestone reported in appropriate publications and conferences • Software demonstrations • Staffing & Budget (estimated) • Principal investigators: 1 to 2 FTE. • Graduate students: 3 to 4. • Budget: $800,000. Purdue-Tivoli Partnership
Past Research • The Sanctuary Project • A mediator-based database integration environment for CORBA and DCOM-based heterogeneous data sources. Supported by NSF grant IRI-9632595. • Used to perform data migration from Unidata/VMARK CODASYL databases to O2 object-oriented databases; construct object-oriented applications atop CODASYL applications; support the integration of the object-oriented Catalyst software engineering environment with ODBC-compliant DBMSs. • John Todd, Roger King, William J. McIver, Jr., Richard Osborne, Christian Och, Nathan Getrich, Brian Temple. “Building Mediators from Components.” (To appear) Proceedings of The International Symposium on Distributed Objects and Applications (DOA’99). Edinburgh, Scotland. September 5 - 6, 1999. Purdue-Tivoli Partnership
Past Research • Souk nets (preliminary work) • Development of initial analytic/conceptual framework for component-based database integration. (Since April 1999) • William J. McIver, Jr., Karim Keddara, Christian Och, Roger King, Clarence A. Ellis, John Todd, Nathan Getrich, Richard M. Osborne, Brian Temple. “An Overview of Souk Nets: a component-based paradigm for data source integration." (To appear) The Seventh International Workshop on Database Programming Languages (DBPL 1999). Kinloch Rannoch, Scotland. September 1st - 3rd, 1999. Purdue-Tivoli Partnership
Purdue-Tivoli Partnership:Exploiting Purdue’s Technological Prowess W. Kent Fuchs Head & Professor Electrical & Computer Engineering Dependable Distributed & Mobile Computing Purdue-Tivoli Partnership
Long Term Objectives • The objectives of my research are: • Rapid recovery from failures • Hardware and software faults • Clusters of networks • Mobile notebooks and hand-held devices • Accurate and preventive diagnosis of faults Purdue-Tivoli Partnership
Wired network Mobile support station Mobile hosts Mobile environment Homogeneous environment Heterogeneous environment Impact • Highly reliable computation and communication in changing environments Purdue-Tivoli Partnership
Research Methodology • RENEW –– Recoverable Networks of Workstations WORK * Simple application development P1 P2 P3 P4 * Good performance W O R K * Transparent fault recovery Application User * Rapid prototyping of new FT techniques * Standard benchmarks * Representative environments User requiring dependable computing Purdue-Tivoli Partnership
Application MPI Ckp. Ckp. & Rec. protocol Message Server Passing Process Job Fault Module Managem. Detection Ckp. Operating System Operating System Computing nodes File servers Ethernet, ATM Purdue-Tivoli Partnership
Exec.Time[sec] NOTE: Ckp period 5 min. Purdue-Tivoli Partnership
Backup (sec) Program Local Disk (sec) Remote Disk (sec) Checkpoint Size (KB) 3.87e-4 3.85e-4 3.83e-4 3.86e-4 3.85e-4 3.83e-4 4.34e-4 1.541 0.007 0.007 0.378 1.677 0.065 0.219 5.668 0.020 0.018 1.401 6.033 0.263 0.857 58653 52 52 14500 65595 2135 7517 btree flops20 nsieve swim tfftdp tomcatv mgrid X X • PREACHES (Portable Recovery and Checkpointing in Heterogeneous Systems) Purdue-Tivoli Partnership
Checkpoint MSS2 MSS1-MSS2-MSS3 Checkpoint MSS2 MSS3 MSS1 HA Research Plan • Recoverable Mobile Distributed Systems • High availability and reliability • Power and bandwidth conservation Purdue-Tivoli Partnership
Past Research • RENEW (Recoverable Network of Workstations) • N. Neves and W. K. Fuchs, “RENEW: A Tool for Fast and Efficient Implementation of Checkpoint Protocols,” IEEEFault-Tolerant Computing Symposium, pp. 58-67, June 1998. • PREACHES (Portable Recovery and Checkpointing in Heterogeneous Systems) • K.-F. Ssu and W. K. Fuchs, “PREACHES –– Portable Recovery and Checkpointing in Heterogeneous Systems,” IEEEFault-Tolerant Computing Symposium, pp. 38-47, June 1998. • RAMs (Recoverable Mobile Systems) • B. Yao, K.-F. Ssu, and W. K. Fuchs, “Message Logging in Mobile Computing, IEEEFault-Tolerant Computing Symposium,” pp. 294-301, June 1999. Purdue-Tivoli Partnership
Purdue-Tivoli Partnership:Exploiting Purdue’s Technological Prowess Shimon Y. Nof Professor of Industrial Engineering Design of Middleware Protocols for e-Business Interactions Purdue-Tivoli Partnership
Long Term Objectives • The objectives of our research are: • Develop a set of collaborative workflow protocols for guiding and optimizing the performance of e-business interactions in heterogeneous, autonomous and distributed environments, e.g. network of ERPs, HelpDesks • Develop complex problem-solving scheme/protocol via interactions among distributed knowledge-based systems Purdue-Tivoli Partnership
Long Term Objectives (continue) • Design recommendations for knowledge-based protocols customized for the needs of particular enterprises and markets • Design of an executable specification-tool for protocol development, which will translate interactions and flow definition of particular protocols into executable code, subject to the needs of the users and the organization Purdue-Tivoli Partnership
Impact • In the emerging global e-business market, effective service availability, e.g., 24x7; Tivoli’s Service Desk, is a key to success. The collaborative work protocol is a task administration protocol which will provide: • Automation among interactions that include decision activities; automation of the process to provide service at minimum cost and maximum quality • From our experimental results, specification and working environment of protocols will be identified, to design/select the right protocol for the right situation • In complex decision-making processes, human interactions are required, e.g Help desk application. The protocol will reduce decision-making time/cost by extracting the right information Purdue-Tivoli Partnership
Research Methodology • The research will employ a new version of TIE, Teamwork Integration Evaluation developed previously with NSF support. TIE’s purposes: • Compute performance measures of protocols • completion time, e.g. transaction processing, negotiation • penalty measures, e.g. # of aborted (time-out) connections • messages queue of each party • relative cost-quality model of the service system • Model the interactions among parties in both synchronous and asynchronous mode.TIE is based on the MPI technique, it provides true parallelism analysis of the interaction behavior. Purdue-Tivoli Partnership
Research Methodology (Continue) • With MPI, TIE can run on both parallel machine like Paragon, Origin2000; or network of computers e.g., Suns, PC windows-NT • To compare protocols’ performance, experiments will be conducted under different environments with variable # of participants and service demand • Use traditional e-business protocol “Contract Net” as base-line protocol • Use IBM’s Situation Manager to coordinate conditional, triggered actions Purdue-Tivoli Partnership
Research Plan for Three Years • Develop specifications of protocols for e-business requirements, based on previous research (4 Months) • Modify TIE for protocol evaluation, conduct protocol performance experiments, analysis (8 Months) • Develop TIE description language for general modeling purpose; implement for target application (12 Months) • Develop TIE conversion program for executable protocol code; apply for target application (12 Months) • DELIVERABLES: Protocol models, TIE, Language, Converter • BUDGET: Advisor + 2 students @ $50,000/ year Purdue-Tivoli Partnership
Past Research • DPIEM -- Distributed Parallel Integration Evaluation Model Organizing/reorganizing resources among distributed networked organizations, based on parallelism theory of computing & communication (Ceroni and Nof, 1999, Research Memo No. 99-04, School of IE, Purdue University) • ABMS -- Agent-based Manufacturing System General model of cooperation & collaboration among autonomous agents, resources and tasks. Shows the need to use workflow protocol to coordinate agents’ tasks (Huang and Nof, 1998, Int’l Journal of Production Research) Purdue-Tivoli Partnership
Past Research (continue) • DAF-Net -- Data Activity Flow Net ; and AIMIS -- Agent-based Integration Model of Information Systems A collaboration scheme and coordinated execution for distributed, heterogeneous CIM data activities(Kim and Nof, 1998, Int’l Journal of Industrial Engineering) • Active database coordination of multiple CIM databases Monitors events/situations of interest and, when given conditions are met, an appropriate action is triggered. (Etzion, Dori and Nof, 1995, Int’l Journal of CIM) Purdue-Tivoli Partnership
Concluding Thought • “Many companies view each negotiation as a separate situation, but companies that take a more coordinated approach are making better deals and forging stronger relationships” (Harvard Business Review, May-June, 1999) • Analogy for our research: We can significantly improve performance by interactions among participants if an effective, customized protocol is used to coordinate the interactions needed for particular environments Purdue-Tivoli Partnership