Exogean: an expert gene annotation framework based on directed acyclic coloured multigraphs

ENCODE Gene Prediction Workshop - EGASP/2005 Exogean: an expert gene annotation framework based on directed acyclic coloured multigraphs Sarah Djebali, Franck Delaplace, Hugues Roest Crollius

Human experts generate high quality gene annotations • Human experts generate reference gene annotations • automating human expertise could provide highly specific • gene models • What do human experts do? • Human experts combine biological objects using heuristic rules • Both biological objects and heuristic rules evolve with time

Exogean: a highly flexible method that automates human expertise • Exogean is a generic framework based ondirected acyclic • coloured multigraphs (DACMs)made to allow the integration • of any set of heuristic rules to any set of resources • In Exogean DACMs: • Nodes are biological objects (protein or mRNA alignments, …etc) • Multiple edges between nodes are relations between objects • In terms of DACMs the human expert annotation protocol • corresponds to building, reading and reducing DACMs

CDS Identification Information Collection Filter Filter - Blat - Spidey - Blast … etc Exogean main steps Reduction Reduction Reduction DACM1 DACM2 DACM3 Single Type Multi Molecule Clustering Multi Type Multi Molecule Clustering Single Molecule Clustering output Final gene models with multiple transcripts Exogean core: DACM expert annotation Protein and mRNA alignments called HSPs

h1 h2 h3 h4 h5 h6 h7 h8 h9 h10 h11 h12 h13 h14 h15 h16 h17 h18 Example: several mRNAs and proteins have been aligned to a specific locus • rmi = mRNA • molecule rm1 • pmj= protein • molecule rm2 • hk= mRNA or protein HSP rm3 pm1 pm2 pm3

Building and reducing DACM1 = the Single Molecule Clustering

mRNA, protein HSPs Level1 transcript models Level2 transcript models Level3 transcript models DACM2 building + reduction DACM3 building + reduction DACM1 building + reduction DACM expert annotation Each DACM reduction produces more complexe transcript models

M1 M2 DACM3 reduction produces final transcript models Mi = final multi type multi molecule transcript model in which Exogean searches for a CDS 1 3 3 2 2 2 1 1 3 2 2 1

FN FN TP FN HAVANA TP FP FP Method_X Evaluation method

Specificity on the 44 ENCODE regions

Sensitivity on the 44 ENCODE regions

Evaluation method • TP = True Positive : each HAVANA CDS matched exactly by at least one CDS from method_X is counted as TP • FP = False Positive : a virtual HAVANA CDS is defined as a method_X CDS that does not match exactly a HAVANA CDS and is counted as FP • FN = False Negative : each HAVANA CDS that is not matched exactly by at least one method_X CDS is counted as FN

r1 r2 r3 h1 h2 h3 h4 h5 h6 h7 h8 h9 h10 h11 h12 h13 p4 h14 h16 h17 h18 DACM1 reduction produces level1 transcript models • ri = mRNA level1 • transcript model • pj= protein level1 • transcript model p1 p2 p3 h15

Building and reducing DACM2 = the Single Type Multi Molecule Clustering

R1 (r1,r3) 2 2 1 1 1 R2 (r2,r3) 2 1 1 1 P1 (p1,p3) 1 2 1 1 P2 (p2) 1 1 P3 (p4) 1 1 DACM2 reduction produces level2 transcript models • Ri = mRNA level2 • transcript model • Pj = protein level2 • transcript model

Building and reducing DACM3 = the Multi Type Multi Molecule Clustering

Exogean: an expert gene annotation framework based on directed acyclic coloured multigraphs

Exogean: an expert gene annotation framework based on directed acyclic coloured multigraphs

Presentation Transcript

Annotating Gene Products to the GO

3. Genome Annotation: Gene Prediction

Gene Structure Annotation

Introduction to the Gene Ontology and GO Annotation Resources

Introduction to the Gene Ontology and GO Annotation Resources

ENIGMA : comparative, consensus gene structure annotation

Bayesian Networks for Modeling Gene Expression Data

SSSP in DAGs (directed acyclic graphs)

Directed Acyclic Graphs: A tool to incorporate uncertainty in steelhead

Ensembl Genome Annotation Overview

The use of the concepts of evolutionary biology in genome annotation.

The Gene Ontology Annotation (GOA) Database and enhancement of GO annotations through InterPro2GO

Exogean: an expert gene annotation framework based on directed acyclic coloured multigraphs

3. Genome Annotation: Gene Prediction (II)

Vlad: A Visual Annotation Display Tool

Gene annotation: Nitrogen transporter families and N metabolism

Improved Shortest Path Algorithms for Nearly Acyclic Directed Graphs

Robots and Automatic Genome Annotation

GOA: Looking after GO annotations

Operationally Optimal VERTEX-BASED SHAPE CODING

Corpus annotation

Annotation-Based Access Control