180 likes | 292 Views
BeeSpace Software. Plans, Design, and Development. Outline. Goals Context Approach Software Process Functionality Design Implementation Details Future Prospects. Project Goals & Parameters. “This project will analyze social behavior… using Apis Mellifera as the model organism”.
E N D
BeeSpace Software Plans, Design, and Development
Outline • Goals • Context • Approach • Software Process • Functionality • Design • Implementation Details • Future Prospects
Project Goals & Parameters • “This project will analyze social behavior… using Apis Mellifera as the model organism”. • Goal: support research and analysis of the Western honey bee. • Using “biology research (that) will generate a unique database of gene expressions…” and “microarray experiments (that will) utilize the recently sequenced genome, supported by state-of-the-art statistics.” • Goal: support application of biological methods and techniques for exploratory analysis. • And using “informatics research (that) will develop an interactive environment to analyze all information sources relevant to bee social behavior.” • Goal: support application of language processing methods for exploratory analysis. • “The BeeSpace environment will enable users to navigate a uniform space of diverse databases and literature sources for hypothesis development and testing. (Ref: http://www.beespace.uiuc.edu/) • Goal: support dual analysis methodologies via an integrated analysis environment. • Parameter: 5 years to complete project, includes research, development, deployment, outreach and documentation. • Parameter: annual milestones and workshops expected.
Context • There are voluminous amounts of biomedical and genomic literature containing valuable knowledge and research results. • Implication: Too much for human processing; and not in a machine-ready format for reasoning based systems. • There exist novel language processing techniques that have been primarily applied in niche applications. • Implication: Emerging technologies (NLP, TM, etc.) can provide backbone for strategic solution, but their risks must be mediated thru controlled developmental cycles. • There exist numerous, but currently isolated, tools for data processing of bioinformatics. • Implication: Opportunities exist for interoperability with disparate systems, but success hinges on standardization. • The web is seeing an increase in smaller, highly focused communities-of-interest. • Implication: Opportunities exist for supporting the creation and management of localized “knowledge-spaces”.
Context – Related Tools & Projects • 3rd Millennium Inc. – “…development of an integration framework for genomic, gene expression, and interaction data (protein-protein well as protein-DNA) from multiple sources and model organisms that can enable the display of the relationships between biochemical objects into the context of biological pathways and networks.” • iHOP – Information Hyperlinked Over Proteins: supports lookup and summarization of genes/proteins. “In general more than 90% of all active relations between proteins in the literature are expressed syntactically as ‘protein verb protein’”. Ref. • IntAct Database – “IntAct provides a freely available, open source database system and analysis tools for protein interaction data. All interactions are derived from literature curation or direct user submissions and are freely available.” • Entrez eUtils – A web services (SOAP) interface for programmatically querying and interacting with NCBI databases.
Software Process System Development Life Cycle (SDLC) • Identify project goals and critical success factors. • Investigate current methodologies and tools that have functional or domain overlap with project objectives. • Research the applicability of novel analysis techniques for extracting deeply embedded and stratified knowledge structures. • Build an integrated software suite that will allow for interactive analysis and augmentation of rich data sets. • Test and deploy software to focused user groups. • Document and publish research results. • Re-iterate above process for continuous quality improvement.
Functionality • Should be web-based system supporting lightweight GUI components and having minimal end-user requirements. • Should accommodate user-directed query-by-navigation (QBN) of “concept space”. • Should extract and normalize concepts as “equivalence classes” of things with highly similar meaning. Should recognize and denote entities. • Should allow user to drill-down, drill-up and drill-across concept space. E.g. text-to-concept, concept-to-concept, concept-to-theme, and the reverse directions as well. • Should allow user to perform encyclopedia-style lookup of entities. • Should provide hooks for tie-in to 3rd party bioinformatics tools.
Design Principles • Maintainability • Portability • Extensible • Efficiency • Organized • Interoperability • Configurability • Ease-of-use • Trusted • “Quality without a Name” References: “Code Complete”, 2nd ed., “Pattern-Oriented Software Architecture”, volume 1.
Implementation Details The current system is being constructed as follows: • The (v1.0) application is being developed as a web-based application. • Design Decision: The interface is built on top of lightweight technologies (e.g. HTML, DHTML & JavaScript). Typical web-app challenges, such as sessioning and security, need to be addressed. • The output of the data processing pipeline is a set of indices and annotated data files that the client application depends on. • Design Decision: There is a clear separation-of-concerns between the server-side processing and the client-side interface. XML is being fully utilized to as a data interchange format between software components. • The pipeline is composed of independent software components, but these components need to be inter-connected. • Design Decision: Components are called as executables with defined interfaces. • Some components need to be able to store their data aggregations persistently (and other components may need access to this data). • Design Decision: Currently each component handles this problem independently. Better, long term solution is to extract out this concern and address it globally; for example, using ORDBMS.
Future Implementation Details • Support both a web interface (HTML, CSS, DHTML, JavaScript) and a full-blown GUI interface (Java Web Start app). • Consistent Java implementation for portability, maintainability, RAD, etc. • Incorporate a DBMS for consistent handling of “persistent storage”. • Library extensions for communication between distributed, heterogeneous applications (perhaps KIF). • Optimized data processing and communication.
Future Prospects • Generalize the system so that it is NOT domain-specific and can be readily applied to other domains. • Allow for persistent sessioning and sharing of sharing of knowledge-spaces amongst communities-of-interest. • Support a visual query system (VQS) interface and/or a query-by-example (QBE) interface. • Support all kinds of hypothesis generation: deduction, abduction & induction. • Support personalized annotations. (What constitutes a “good” KR structure: clarity, logic, expressive?). • Smooth the integration between the BeeSpace Navigator and the myriad number of web-based tools. • Support n-ary, semantically rich relations as opposed to just dyadic.