1 / 55

Marco Brandizi Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report

Marco Brandizi Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report Feb 2005. Agenda. Microarrays and Gene Expression overview A Knowledge Managment System for uA data management Motivations What to model and where to start from First elaborations

kendra
Download Presentation

Marco Brandizi Corso di Dott. in Informatica, Univ. Milano Bicocca XIX Ciclo Progress Report

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Marco Brandizi • Corso di Dott. in Informatica, Univ. Milano Bicocca • XIX Ciclo • Progress Report • Feb 2005

  2. Agenda • Microarrays and Gene Expression overview • A Knowledge Managment System for uA data management • Motivations • What to model and where to start from • First elaborations • Ongoing work and future

  3. Gene Expression and Microarrays

  4. Genes Machine gene DNA mRNA protein Cell/Life

  5. Microarray Data / Details

  6. Microarray Data

  7. Microarrays Data Mgmt Issues • Exp. data vs. seq. data: • Context dependent (living system, exp. Conditions) • Lack of standard unit of measure • Several normalizations methods • Multiple platforms and methods • No standard for data annotation • Vocabularies and terminology coherence • Details about: experiment, source, protocols, exp. conditions

  8. Microarrays Data Mgmt Issues / 2 • Evidences about data quality • What to store? • Raw Images • Computed values • Normalized values • How to find data • Complex vocabularies aware systems (ontologies) • Data mining and exp. comparison tools • Data access control

  9. Issues => MIAME/MAGE

  10. MIAME Experiment Modelling

  11. GCA DB

  12. GCA DB

  13. GCA DB

  14. Need of a KMS for uA data management

  15. The uA Experiment Cycle

  16. “Closing the loop”

  17. “Closing the loop”

  18. uA KMS: What to model?

  19. Knowledge management... what? • Genes • Textual annotations, literature • Interactions, pathways. • Genes collections (functional families, clusters) • Experiment and Experimental Conditions • Keyword/ontology based searches • Tested conditions searches • Expression Values • Navigation • Same trascriptome/trend/correlation/pattern

  20. Knowledge management... what? • Chips • Keyword searches • Annotations about chip quality, protocols to be used, etc. • People • “Is expert in ...” • “Works with ...” • “Is studing ...” • Its ranking is X (based on publications, user preferences, etc.

  21. Knowledge management... what? “Does IL-2 regulate something and under what conditions?” Interactions of gene: IL2 Note added by Norman on Dec 10, 2003, deduced by text, not confirmed [see original text]: IL2 --------> UPREGULATES -----------> IL10, IL12 | ON --> Cellls --> Type: DC UNDER CONDITION: LPS Stimulus

  22. Knowledge management... what?

  23. Knowledge management... what?

  24. Knowledge management... what?

  25. uA KMS: Where to start from?

  26. What to do first? • Gene Expression Formal Model • Focused on GE measures • Oriented to “closing the loop” goal • Several things to start from • Ontologies and Inference Systems • Already defined alike models • Other alike systems

  27. Defining a GE Model • Start Point: Ontologies and Inference Systems • XML->RDF->...->OWL, and related tools (ex.: Protegé, Racer, Jena) • Logics, particularly Description Logic • Inferential Systems and Languages (ex.: Prolog)

  28. Defining a GE Model • Start Point: Already defined alike models • "Modeling Gene Expression", Proceedings of NETTAB/2004, www.loa-cnr.it, A model in Description Logic of GE, but without focus on microarrays and expression intentsities

  29. Defining a GE Model • Start Point: Already defined alike models • Very similar to previous work, but with tools for annotation/querying of microarray chips • Yet, seems not focused on data/assays/etc. annotation.

  30. Defining GE Model • Start point: Other alike systems • Synapsia by Agilent, very similar, but not focused on uAs • Hybrow, www.hybrow.org, a computer-aided hypothesis evaluation • The Notebook Project, www.notebook.org, a bio-KMS based on SOAP and P2P • 2004, Sarini, M., Blanzieri, E., Giorgini, P., and Moser, C., From actions to suggestions: supporting the work of biologists through laboratory notebooks

  31. Defining GE Model • Start point: Other alike systems

  32. uA KMS: Toward a GE Model

  33. Defining GE Model • Gene Expression Formal Model • Basic elements: genes, hybridizations, experiments

  34. Defining GE Model • Gene Expression Formal Model • Basic elements: annotated sets

  35. Defining GE Model • Gene Expression Formal Model • Basic elements: annotated sets

  36. Gene Expression Entities

  37. Entities Grouping • Cluster of DataSet ::= Cluster of DataSet | GeneCluster of DataSet.GeneSet | HybCluster of DataSet.HybSet • Cluster of Entity :: = Cluster of Entity | Set of Entity EntitityCollection ::= Cluster of DataSet | Cluster of Entity All entities in a cluster are of same type. Ex. A cluster of genes contains hierarchically grouped sets of genes, only genes. NOTE: Grammar here used is VERY informal!!!

  38. Entities Grouping GeneSet ::= Set of Gene HybSet ::= Set of Hybridization Set of X ::= { x : x IS-A X } Singleton ( C ) ::= { S : S = Set of C AND #S = 1 }

  39. Annotations Annotation ::= EntityCollection => AnnotationSet Annotation allows to track Gene Expression data with useful info.

  40. Annotations/Basics Annotation ( EmptySet ) ::= EmptySet Annotation ( Singleton ( Entity e ) ) ::= Attributes ( e ) U BaseAnnotation ( e )

  41. Annotations/Basics • BaseAnnotation ( Any ) ::= • To be decided, first ideas is a set of: • Name/Value/Type, and Description like in MAGE • External Reference, with URI, or attachment • Graph attachment, "vectoring" values, ex: PCA with components values, scatter plots wit • Annotation Author • Annotation Date • Security/Access references • Alike the classes Extendeable, Describable, Identifiable of MAGE-OM • Entity annotates another Entity, ex.: Exp author

  42. Annotations/Basics • Attributes ( Entity e ) ::= • Set of < attrib, value, type > for each declared attribute of Entity • attribute may be declared in JavaBean fashion, optionally providing a mapping for type and semantic of attribute • Annotation ( GeneSet GS ) ::=BaseAnnotation ( GS )U Annotation ( g ) : g BELONGS GS U BiologicalAnnotation ( GS )

  43. Annotations/Biological Ann. • BiologicalAnnotation ( GS ) ::= • Allows for tagging the gene set with a biological meaning the genes have ben grouped why • Ex.: • belonging to functional family of apoptosis • in the KEGG pathway about IL-2 • under GO ID #10234

  44. Annotations/Data Sets • Annotation ( Cluster of DataSet ds ) ::= • BaseAnnotation ( ds ) U Annotation ( < all entities in ds > ) • Meaning of clustering • Clustering method / alghoritm • Alghoritm annotations, ex.: parameter values • Cluster includes the case of flat set (not tree), and sub-cases: • gene/hybs filtering ( genes have been filtered in from another data set ) • values transformation ( normalization, PCA, average on replicas )

  45. Annotations/Examples • Types of annotations / searches: • Generic • <attribute> LIKE <pattern> • <value> BETWEEN ( <lo>, <hi> ) • <author> IS author • Genes • public_id LIKE pattern • REGULATION ( g1, g2, ... gn ) • g1 REGULATES | DOWN_REGULATES | UP_REGULATES | PROMOTE | INHIBITS ( g1, g2, ... gn ) • geneX IN_PATHWAY ( p )

  46. Annotations/Examples • DataSet • geneSet1 SIMILAR_PROFILE geneSet2 IN_DATASET ds • hybSet1 SIMILAR_PROFILE hybSet2 IN_DATASET ds • Not necessarily computed, annotated. • CORRELATION ( dSet1, dSet2 ... dSetN, value ) • annotates the expression values correlation

  47. Annotations/Time Courses Time profiles may be shifted between or it may happen that when a gene is up another is down.

  48. Annotations/Time Courses Time profiles may be shifted between or it may happen that when a gene is up another is down. geneSet1 SIMILAR_TIME_PROFILE geneSet2, a part. case of SIMILAR_PROFILE geneSet1 TIME_AFTER geneSet2 ... geneSet1 TIME_BEFORE geneSet2 ... geneSet1 TIME_SHIFT geneSet2 ... geneSet1 TIME_OPPOSED geneSet2 geneSet1 TIME_OPPOSED_SHIFT geneSet2

  49. Annotations/Comparisons “+/-” graphs. Common graphs of gene interaction that is evident from comparison experiments. Modeled via previous shown constructs.

  50. Annotations/Set Graphs Observing Eulero-Venn diagrams is very common. Modeled via Aset theory operations

More Related