160 likes | 304 Views
Compatible text, visual and mathematical representations for biological process ontologies. Nigam Shah Penn State University. Ontologies in Molecular Biology. An ontology is a formal way of representing knowledge.
E N D
Compatible text, visual and mathematical representations for biological process ontologies Nigam Shah Penn State University
Ontologies in Molecular Biology • An ontology is a formal way of representing knowledge. • In an ontology, concepts are described both by their meaning and their relationship to each other.* • Gene Ontology • 43 open ontologies under OBO • First name ‘things’ … then name ‘relations’. • If we specify the ‘logic’ of combining ‘things’ and ‘relations’ we can write hypotheses about biological processes in a formal manner & evaluate them for consistency with existing information. * Bard and Rhee, Nature Reviews Genetics, Vol 5, March 2004, pg 213
Hypotheses and Events An hypothesis about a biological process is a statement about relationships within a biological system. Protein P induces transcription of gene X We define an ‘event’ as a relationship between two biological entities, which we call ‘agents’.
Testing events Protein P induces transcription of gene X Implicit claims (that can made explicit): • P is a transcription factor. • P is a transcriptional activator. • P is localized to the nucleus. • P can bind to the promoter of gene X P promoter | gene X nucleus
Hypothesis Ontology • Expressive enough to describe the galactose system at a coarse level of detail. • It is compatible with other ontology efforts. • E.g. GO so that GO annotations can be used directly in HyBrow. • We have also developed a grammar to write hypotheses using events from this ontology.
Grammar for a hypothesis A hypothesis consists of at least one event stream An event stream is a sequence of one or more events or event streams with logical joints (or operators) between them. An event has exactly one agent_a, exactly oneagent_b and exactly one operator (i.e. a relationship between the two agents). It also has a physical location that denotes ‘where’ the event happened, the genetic context of the organism and associated experimental perturbations when the event happened. A logicaljoint is the conjunction between two event streams.
Making Hypotheses with increasing ‘formality’ Controlled Vocabulary Formal Language Context-Free Grammar A biological event is any occurrence for which we gather experimental data. Hypotheses make testable statements about combinations of biological events. The mathematical representation We have developed a formal language & grammar for representing an hypothesis as a sequence of events. We use ‘constraints and rules’ to decide if an hypothesis is a valid production of the language. http://conferences.computer.org/bioinformatics/CSB2003/SectA.html#Poster9
Constraints and Rules • Consistency of an hypothesis with prior knowledge is evaluated by applying constraints and rules. • A constraint is a statement specifying the evidence that contradicts or supports an event. • A protein must be in the nucleus to bind to a promoter. • A rule comprises the ‘steps’ for deciding whether a constraint is satisfied or violated. Binds_to_promoter [P, g] : Annotation constraints if cellular location of P is not nucleus, give a penalty. if biological process is not transcription, give a penalty.
Visual language representation Uses a formalVisual Language: • Direct composition of hypotheses in a format akin to reaction pathway diagrams • Translatable to other representation forms
Other notations: Cook Notation -- BioD Kohn Notation
Multiple ‘views’ of the ontology • Once we have an ontology for hypotheses … it can be represented as • Text files that users type. • As formal constructs that can be evaluated for validity in a formal manner. • As files that are ‘browsed’ by using special programs. • Having such equivalent formats allows us to perform computer aided hypothesis-evaluation.
Multiple equivalent representations Biological process described in a formal language ev0 = Gal2p transports galactose in mem in wt ev1 = galactose activate Gal3p in wt in cyt ev2 = Gal3p Binds_to_promoter gal1 in wt in nuc ev3 = Gal3p induce gal1 in presence_of galactose in wt in nuc hy1 = (ev0+ev1) and (ev2+ev3) XML format?
Stephen Racunas sar147@psu.edu Nina Fedoroff (Mentor) nvf1@psu.edu Credits More on project website: www.hybrow.org & Aug 1st @ 11:10 AM.