550 likes | 557 Views
This progress report provides an overview of microarrays and gene expression, as well as the motivations and ongoing work for developing a knowledge management system for microarray data management.
E N D
Marco Brandizi • Corso di Dott. in Informatica, Univ. Milano Bicocca • XIX Ciclo • Progress Report • Feb 2005
Agenda • Microarrays and Gene Expression overview • A Knowledge Managment System for uA data management • Motivations • What to model and where to start from • First elaborations • Ongoing work and future
Genes Machine gene DNA mRNA protein Cell/Life
Microarrays Data Mgmt Issues • Exp. data vs. seq. data: • Context dependent (living system, exp. Conditions) • Lack of standard unit of measure • Several normalizations methods • Multiple platforms and methods • No standard for data annotation • Vocabularies and terminology coherence • Details about: experiment, source, protocols, exp. conditions
Microarrays Data Mgmt Issues / 2 • Evidences about data quality • What to store? • Raw Images • Computed values • Normalized values • How to find data • Complex vocabularies aware systems (ontologies) • Data mining and exp. comparison tools • Data access control
Knowledge management... what? • Genes • Textual annotations, literature • Interactions, pathways. • Genes collections (functional families, clusters) • Experiment and Experimental Conditions • Keyword/ontology based searches • Tested conditions searches • Expression Values • Navigation • Same trascriptome/trend/correlation/pattern
Knowledge management... what? • Chips • Keyword searches • Annotations about chip quality, protocols to be used, etc. • People • “Is expert in ...” • “Works with ...” • “Is studing ...” • Its ranking is X (based on publications, user preferences, etc.
Knowledge management... what? “Does IL-2 regulate something and under what conditions?” Interactions of gene: IL2 Note added by Norman on Dec 10, 2003, deduced by text, not confirmed [see original text]: IL2 --------> UPREGULATES -----------> IL10, IL12 | ON --> Cellls --> Type: DC UNDER CONDITION: LPS Stimulus
What to do first? • Gene Expression Formal Model • Focused on GE measures • Oriented to “closing the loop” goal • Several things to start from • Ontologies and Inference Systems • Already defined alike models • Other alike systems
Defining a GE Model • Start Point: Ontologies and Inference Systems • XML->RDF->...->OWL, and related tools (ex.: Protegé, Racer, Jena) • Logics, particularly Description Logic • Inferential Systems and Languages (ex.: Prolog)
Defining a GE Model • Start Point: Already defined alike models • "Modeling Gene Expression", Proceedings of NETTAB/2004, www.loa-cnr.it, A model in Description Logic of GE, but without focus on microarrays and expression intentsities
Defining a GE Model • Start Point: Already defined alike models • Very similar to previous work, but with tools for annotation/querying of microarray chips • Yet, seems not focused on data/assays/etc. annotation.
Defining GE Model • Start point: Other alike systems • Synapsia by Agilent, very similar, but not focused on uAs • Hybrow, www.hybrow.org, a computer-aided hypothesis evaluation • The Notebook Project, www.notebook.org, a bio-KMS based on SOAP and P2P • 2004, Sarini, M., Blanzieri, E., Giorgini, P., and Moser, C., From actions to suggestions: supporting the work of biologists through laboratory notebooks
Defining GE Model • Start point: Other alike systems
Defining GE Model • Gene Expression Formal Model • Basic elements: genes, hybridizations, experiments
Defining GE Model • Gene Expression Formal Model • Basic elements: annotated sets
Defining GE Model • Gene Expression Formal Model • Basic elements: annotated sets
Entities Grouping • Cluster of DataSet ::= Cluster of DataSet | GeneCluster of DataSet.GeneSet | HybCluster of DataSet.HybSet • Cluster of Entity :: = Cluster of Entity | Set of Entity EntitityCollection ::= Cluster of DataSet | Cluster of Entity All entities in a cluster are of same type. Ex. A cluster of genes contains hierarchically grouped sets of genes, only genes. NOTE: Grammar here used is VERY informal!!!
Entities Grouping GeneSet ::= Set of Gene HybSet ::= Set of Hybridization Set of X ::= { x : x IS-A X } Singleton ( C ) ::= { S : S = Set of C AND #S = 1 }
Annotations Annotation ::= EntityCollection => AnnotationSet Annotation allows to track Gene Expression data with useful info.
Annotations/Basics Annotation ( EmptySet ) ::= EmptySet Annotation ( Singleton ( Entity e ) ) ::= Attributes ( e ) U BaseAnnotation ( e )
Annotations/Basics • BaseAnnotation ( Any ) ::= • To be decided, first ideas is a set of: • Name/Value/Type, and Description like in MAGE • External Reference, with URI, or attachment • Graph attachment, "vectoring" values, ex: PCA with components values, scatter plots wit • Annotation Author • Annotation Date • Security/Access references • Alike the classes Extendeable, Describable, Identifiable of MAGE-OM • Entity annotates another Entity, ex.: Exp author
Annotations/Basics • Attributes ( Entity e ) ::= • Set of < attrib, value, type > for each declared attribute of Entity • attribute may be declared in JavaBean fashion, optionally providing a mapping for type and semantic of attribute • Annotation ( GeneSet GS ) ::=BaseAnnotation ( GS )U Annotation ( g ) : g BELONGS GS U BiologicalAnnotation ( GS )
Annotations/Biological Ann. • BiologicalAnnotation ( GS ) ::= • Allows for tagging the gene set with a biological meaning the genes have ben grouped why • Ex.: • belonging to functional family of apoptosis • in the KEGG pathway about IL-2 • under GO ID #10234
Annotations/Data Sets • Annotation ( Cluster of DataSet ds ) ::= • BaseAnnotation ( ds ) U Annotation ( < all entities in ds > ) • Meaning of clustering • Clustering method / alghoritm • Alghoritm annotations, ex.: parameter values • Cluster includes the case of flat set (not tree), and sub-cases: • gene/hybs filtering ( genes have been filtered in from another data set ) • values transformation ( normalization, PCA, average on replicas )
Annotations/Examples • Types of annotations / searches: • Generic • <attribute> LIKE <pattern> • <value> BETWEEN ( <lo>, <hi> ) • <author> IS author • Genes • public_id LIKE pattern • REGULATION ( g1, g2, ... gn ) • g1 REGULATES | DOWN_REGULATES | UP_REGULATES | PROMOTE | INHIBITS ( g1, g2, ... gn ) • geneX IN_PATHWAY ( p )
Annotations/Examples • DataSet • geneSet1 SIMILAR_PROFILE geneSet2 IN_DATASET ds • hybSet1 SIMILAR_PROFILE hybSet2 IN_DATASET ds • Not necessarily computed, annotated. • CORRELATION ( dSet1, dSet2 ... dSetN, value ) • annotates the expression values correlation
Annotations/Time Courses Time profiles may be shifted between or it may happen that when a gene is up another is down.
Annotations/Time Courses Time profiles may be shifted between or it may happen that when a gene is up another is down. geneSet1 SIMILAR_TIME_PROFILE geneSet2, a part. case of SIMILAR_PROFILE geneSet1 TIME_AFTER geneSet2 ... geneSet1 TIME_BEFORE geneSet2 ... geneSet1 TIME_SHIFT geneSet2 ... geneSet1 TIME_OPPOSED geneSet2 geneSet1 TIME_OPPOSED_SHIFT geneSet2
Annotations/Comparisons “+/-” graphs. Common graphs of gene interaction that is evident from comparison experiments. Modeled via previous shown constructs.
Annotations/Set Graphs Observing Eulero-Venn diagrams is very common. Modeled via Aset theory operations