440 likes | 562 Views
Representation of Ontology Annotation Information in Grid Computing Prospectus. November 18, 2004 David A. Gaitros Department of Computer Science Florida State University. Overview. Background on Annotation The Problem Research Goals Research Objectives Projected Accomplishments
E N D
Representation of Ontology Annotation Information in Grid Computing Prospectus November 18, 2004 David A. Gaitros Department of Computer Science Florida State University
Overview • Background on Annotation • The Problem • Research Goals • Research Objectives • Projected Accomplishments • Projected Activities • Graphical Annotation • Generic Annotation • Biological Database Problems • Proposed Web Services Implementation • Morphbank Database Schema • Expected Challenges • Masters Thesis/Projects • Conclusion
Background on Annotation • Scientific Annotation Middleware (SAM) • “We are creating a Scientific Annotation Middleware (SAM) system that will provide researchers and developers with the capabilities necessary to manage the complexity resulting from the collaborative, cross-disciplinary, compute-intensive research. SAM will include components and services that enable researchers, applications, problem solving environments (PSE) and software agents to create metadata and annotations about data objects and the semantic relationships between them. Human access to the middleware will be through a researcher’s notebook interface available via desktop computers and PDA devices. “ Source: http://collaboratory.emsl.pnl.gov/docs/collab/sam/samprojoverview.html
Background on Annotation • Garlic – IBM “Garlic is a project being developed by members of the database group in Computer Science. The goal of Garlic is to enable large-scale multimedia information systems: large scale in that they involve lots of data with multimedia taken as broadly as possible to mean data of many types. We are particularly concerned about situations in which there is enough data of sufficiently specialized types that users have already made decisions about how to manage it, and have stored it in separate repositories that are specifically adapted to data of that type. “ Source: http://www.almaden.ibm.com/cs/garlic/
Background on Annotation • The Garlic Approach Query tool C++ API Garlic Schema Object Oriented Middleware Metadata Image Wrapper Relational Wrapper Document Wrapper RDBMS Document Store Image Store Source:http://www.almaden.ibm.com/cs/garlic/
Background on Annotation Data Annotation in Collaborative Research Environments Michael Gertz, Department of Computer Science, University of California at Davis, Concept-based data annotation techniques for scientific databases “It is well accepted that the creation, management, and utilization of different forms of metadata play a major role in realizing information systems infrastructure that are able to provide a rich data query, sharing, and management techniques.” “We claim there is still a major gap between the creation of such semantic rich structures and the usage of these structures to actually enrich various forms of data.
Background on Annotation • Concept Based Data Annotations Concepts (Base concepts and relationships type concepts Data Annotation Web accessible data Scientific Data at Site B Scientific Data at Site A Source: Dr. Michael Gertz, UC Davis
The Problem • The discovery of information relies on the ability of scientists to find and access the correct data • As such, grids and grid computing have emerged as an ever increasing means of sharing large of amounts of information among collaborating organizations. • Searches conducted on annotation of metadata are still limited due to the fact that most database and grid applications are still using ad hoc data storage and retrieval techniques. • Searches on information still rely on a scientists intimate knowledge of data location, format, and how to use specific applications • An annotation tool capable of satisfying the requirements of the Biological community does not currently exist
Research Goals • Improve the ability of biological researchers to search annotated databases for information they need to support their research or findings • Suggest that such improvements can be applied to other scientific applications
Research Objectives • Examine current methods of annotation • Categorize general features of annotation • Define systematic techniques that can be applied to current ad hoc annotation methods
Projected Accomplishments • Identification of the functional areas within the data grid community • Define a relationship model that applies to all scientific annotation • Develop a transformation model whereby any annotation can expressed as an object ( data + operations)
Projected Activities • Initial Plan is to use the new MorphBank Database to prove the concept • Develop a new MorphBank Schema • Develop a new MorphBank Website • Develop a reliable and more capable multi-annotation software tool to replace the I-Note 1.0 annotation package • Develop the methods and schemas that will allow scientists to extract different annotations from Biological images and other objects
Graphical Annotation • There may be more then one image or object associated with a specimen • No practical upper limit can be defined • Standards are still being defined • Each image or object may hold several pieces of information. • No practical upper limit can be defined • Automating annotation is still in the early stages. • Searching the image themselves for data is not feasible in large database systems. • Searching large strings free entry text is also inefficient
Graphical Annotation(cont) • Initially used the I-NOTE software to defined the requirements for the development of a new piece of software to work with Morphbank and on Windows XP/Linux. • Will employ at least the ability to annotate any addressable object in the new tool with Morphbank to show that annotations can be mixed.
Morphology Publication Example Riccardi, Annotation Nov 5, 2004
Example of Extensible Annotation Riccardi, Annotation Nov 5, 2004
Example of Extensible Annotation Riccardi, Annotation Nov 5, 2004
Example of Extensible Annotation Riccardi, Annotation Nov 5, 2004
Limitations of I-note Software • Currently not supported • University of Virginia has cut funding for the project. • Used University programmers for development • Works only on a Windows 95 platform • Code is not maintainable, development was accomplished in a Java Development Environment • Development project was not documented. • Could not attach other objects or documents • Only worked on certain graphic images. • Annotations were not scalable with the image • Annotations were overlay images and had to be stored as full images. • Cannot address multiple objects.
Generic Annotation • Need to develop a method to store different annotations as objects • Need to develop a method to search different annotations for similar or associated information • Replace Ad Hoc queries with more systematic methods • Higher level of ontology for annotations • Need to determine the minimum amount of information needed to represent and access this object
Generic Annotation • General Requirements • Platform and architecture independent • Stand-alone application that can function as a web services • Looking at both server and client side applications • Exchange of information must be done using web service features such as XML documents • Annotation on images must include • Multiple annotations per image/object • Must not alter the original image/object • Must include references to points and areas • Must include text, graphics, and voice • Must include the ability to make general annotation remarks • Must be able to associate multiple objects with an annotation including other annotations
MorphBank Annotation Morphbank XML Morphbank Viewer Morphbank Browser Annotation Applet XML MorphBank RDMS Image Files
Biological Database Problems • Taxonomy terms and definitions are not universally defined • Any database system would have to accommodate different taxonomic structures • Darwin Core standard is not sufficient to satisfy this problem • Each Biological study group develops their own character codes and states • There is no standardization • Any database system would have to accommodate different character codes and states • There is currently not enough justification for the different Biological communities to develop tight integration standards
Proposed MorphBank WebServices WORLD BROWSE INSERTION AND UPDATE BIOLOGICAL DATA ANALYSIS SEARCH & DISCOVERY ADMINISTRATION DATA DISPLAY HIGH LEVEL WEBSERVICES ANNOTATION DISCOVERY METADATA ANNOTATION USER VALIDATION & SECURITY ANNOTATION QUERY BIOLOGICAL QUERY ANNOTATION AGGREGATION DATA VALIDATION ANNOTATION DATA DISPLAY BIOLOGICAL DATA DISPLAY BIO DATA DISCOVERY CORE WEBSERVICES Web Services Access (update, insert, delete, query) SERVICE TRANSLATION LIBRARY METADATA HOLDINGS Other Bio DB Character State Catalog Image XML Files MorphBank XML Files Image Files MorphBank DB Based upon the Earth Systems Grid (ESG) Model
MorphBank Website Intro Screen Info/Help WEB/DB Administration Login Restricted User World Browse Add Update Delete Annotate RU/Browse Browse DS3 DS2 DS1 Working Data Set Under Review World Read
Specimen Table • # • # Table structure for table 'specimen' • # • CREATE TABLE specimen( • MorphBankSpecimenID int(32) auto-increment NOT NULL, • CatalogNumber varchar(128) NOT NULL, • DateLastModified date NOT NULL default '0000-00-00', • InstitutionCode varchar(128), • CollectionCode varchar(128), • ScientificName varchar(128), • BasisOfRecord char(1), • TSN int(32), • CollectionNumber varchar (128), • FieldNumber varchar (128), • CollectorName (128), • DateCollected date NOT NULL default '0000-00-00', • TimeofDate time, • ContinentOcean varchar(128),
Specimen Table – cont. • # CONTINUED FROM PREVIOUS PAGE. • Country varchar(56), • StateProvince varchar(56), • County varchar(56), • Locality varchar(56), • Latitude double, • Longitude double, • CoordinatePrecision int(8), • MinimumElevation int(32), • MaximumElevation int(32), • MinimumDepth int(32), • MaximumDepth int(32), • Sex varchar(8), • PreparationType varchar(255), • IndividualCount int(32), • PreviousCatalogNumber varchar(128), • RelationshipType varchar(128), • RelatedCatalogItem varchar (128), • DevelopmentalStage varchar (128), • Notes varchar(255), • PRIMARY KEY(MorphBankSpecimenID)) • TYPE=MyISAM #DEFAULT CHARSET=latin1; • ;
Image Table • # • #Table Structure for Table 'image' • # • CREATE TABLE image ( • ImageID int(32) NOT NULL auto-increment, • MorphBankSpecimenID int(32), • ViewNumber int(32) , • ImageScale varchar(64), • XDimensionPixels int(32) NOT NULL, • YDimensionPixels int(32) NOT NULL • ResolutionInPixelsPerInch int(32) NOT NULL, • OriginalFileName varchar (255) NOT NULL, • Magnification varchar(128), • ImageFileType varchar(128), • PRIMARY KEY (ImageID)) • TYPE=MyISAM #DEFAULT CHARSET=latin1; • ;
Viewtable • # • #Table Structure for Table 'viewtable' • # • CREATE TABLE viewtable ( • ViewNumber int(32) NOT NULL, • ImagingTechnique varchar (128), • ImagingPreparationTechnique varchar (128), • SpecimenPart varchar (128), • ViewAngle varchar (128), • Sex varchar(8), • DevelopmentalStage varchar (128), • PRIMARY KEY (ViewNumber)) • TYPE=MyISAM #DEFAULT CHARSET=latin1; • ;
Objectannotation Table • # • #Table Structure for Table 'imageannotation' • # • CREATE TABLE imageannotation ( • AcessionNumber int(32) • ImageAnnotationSeqNo int(32) NOT NULL auto-incremental, • CatalogNumber varchar(128) NOT NUL • AnnotationLocX int(32), • AnnotationLocy int(32), • AnnotationRadius int(32), • AnnotationTypeid int(32), • PhylogeneticCharacterID int(32), • PhylogeneticCharacterStateID int(32), • AnnotationAuthor varchar(128), • AnnotationDate date DEFAULT '0000-00-00', • ImageID int(32), • AnnotationObject varchar(255), • PRIMARY KEY (ImageAnnotationSeqNo)) • TYPE=MyISAM #DEFAULT CHARSET=latin1; • ;
AnnotationType Table • # • #Table Structure for Table ‘annotationtype' • # • CREATE TABLE annotationtype ( • annotationtypeID int(32) NOT NULL auto-incremental, • annotationtitle varchar(25), • keywords varchar(255), • description varchar(128), • PRIMARY KEY (annotationtypeID)) • TYPE=MyISAM #DEFAULT CHARSET=latin1; • ;
PhylogeneticCode Table • # • #Table Structure for Table 'phylogeneticcode' • # • CREATE TABLE phylogeneticcharacter ( • PhylogeneticCharID int(32) NOT NULL auto-increment, • CharacterNumber int(32), • PublicationID int(32), • TSN int(32), • CharacterDescription varchar (128), • ViewID int(32), • Sex varchar(8), • Stage varchar (128), • SimilarEntries varchar (128), • RelatedCharacterID int (32), • RelationType varchar (128), • SuggestedTaxonRange varchar (128), • PRIMARY KEY (CharacterID)) • TYPE=MyISAM #DEFAULT CHARSET=latin1; • ;
Phylogeneticstate Table • # • # Table Structure for Table 'phylogeneticstate' • # • CREATE TABLE phylogeneticstate ( • StateID int(32) NOT NULL auto-increment, • phylogeneticcharID int(32) NOT NULL, • Description varchar(128), • ImageID int(32), • AnnotationSequenceNumber int(32), • PRIMARY KEY (StateID)) • TYPE=MyISAM #DEFAULT CHARSET=latin1; • ;
SpecimenPhyChar Table • # • # Table Structure for Table ‘SpecimenPhyChar' • # • CREATE TABLE SpecimenPhyChar( • SpecimenPhyCharID int (32) NOT NULL Auto-increment, • SpecimenID int (32) NOT NULL, • PhylogeneticCharID int(32), • ImageID int(32), • ImageAnnotationSeqNo int (32), • PRIMARY KEY (SpecimenPhyCharID)) • TYPE=MyISAM #DEFAULT CHARSET=latin1;;
Publication Table • # • # Table Structure for Table 'PublicationTable' • # • CREATE TABLE publicationtable ( • PublicationID int (32) NOT NULL auto-inrement, • PublicationAuthor varchar (128), • PublicationYear char(4), • PublicationJournal varchar (128), • PublicationTitle varchar (128), • PublicationPagesFrom int(32), • PublicationPagesto int(32), • PRIMARY KEY (PublicationID)) • TYPE=MyISAM #DEFAULT CHARSET=latin1;;
UserTable • # • # Table Structure for Table 'UserTable' • # • CREATE TABLE usertable ( • UserID int (32) NOT NULL Auto-increment, • Level int (8), • UIN int (8), • PIN int (16), • Name varchar (128), • Email varchar (128), • Affiliation varchar (128), • Address varchar (255), • Country varchar (128), • GroupID int(32), • PRIMARY KEY (UserID)) • TYPE=MyISAM #DEFAULT CHARSET=latin1;;
GroupTable • # • # Table Structure for Table 'Grouptable' • # • CREATE TABLE grouptable ( • GroupID int (32) NOT NULL, • GroupName varchar (128) NOT NULL, • User int(32), • PRIMARY KEY (GroupID)) • TYPE=MyISAM #DEFAULT CHARSET=latin1;;
Expected Challenges • The effort is contingent upon development of a reliable annotation toolset • Development of a generic biological schema • Integration of web services with the new MorphBank system and other Biological Database Systems • Obtaining consensus among the different participants on basic biology ontology issues • Possible use of a general biological thesaurus
Masters Thesis/Projects • MorphBank Requirements Analysis (Thesis/Project) • MorphBank Module Implementation(Project) • MorphBank Security (Thesis/Project) • MorphBank Mirror Site Implementation (Thesis/Project) • MorphBank Operational Site Procedures (Project)
Masters Thesis/Projects • Biological Image eXchangE System (BIXES) • A method and associate software to allow heterogeneous Biological Image Database Systems to exchange images and metadata (project/thesis) • Biological Image Search technique (School of Computation Sciences research project/thesis)
Conclusion • More efficient search on large scientific data systems • Demonstrate that this application is works for biological databases • Show this’s feasible for any scientific application • Provide a new and supported annotation tool set that can be used across the web.