170 likes | 314 Views
ENCODE Gene Prediction Workshop - EGASP/2005. Exogean: an expert gene annotation framework based on directed acyclic coloured multigraphs. Sarah Djebali, Franck Delaplace, Hugues Roest Crollius. Human experts generate high quality gene annotations.
E N D
ENCODE Gene Prediction Workshop - EGASP/2005 Exogean: an expert gene annotation framework based on directed acyclic coloured multigraphs Sarah Djebali, Franck Delaplace, Hugues Roest Crollius
Human experts generate high quality gene annotations • Human experts generate reference gene annotations • automating human expertise could provide highly specific • gene models • What do human experts do? • Human experts combine biological objects using heuristic rules • Both biological objects and heuristic rules evolve with time
Exogean: a highly flexible method that automates human expertise • Exogean is a generic framework based ondirected acyclic • coloured multigraphs (DACMs)made to allow the integration • of any set of heuristic rules to any set of resources • In Exogean DACMs: • Nodes are biological objects (protein or mRNA alignments, …etc) • Multiple edges between nodes are relations between objects • In terms of DACMs the human expert annotation protocol • corresponds to building, reading and reducing DACMs
CDS Identification Information Collection Filter Filter - Blat - Spidey - Blast … etc Exogean main steps Reduction Reduction Reduction DACM1 DACM2 DACM3 Single Type Multi Molecule Clustering Multi Type Multi Molecule Clustering Single Molecule Clustering output Final gene models with multiple transcripts Exogean core: DACM expert annotation Protein and mRNA alignments called HSPs
h1 h2 h3 h4 h5 h6 h7 h8 h9 h10 h11 h12 h13 h14 h15 h16 h17 h18 Example: several mRNAs and proteins have been aligned to a specific locus • rmi = mRNA • molecule rm1 • pmj= protein • molecule rm2 • hk= mRNA or protein HSP rm3 pm1 pm2 pm3
Building and reducing DACM1 = the Single Molecule Clustering
mRNA, protein HSPs Level1 transcript models Level2 transcript models Level3 transcript models DACM2 building + reduction DACM3 building + reduction DACM1 building + reduction DACM expert annotation Each DACM reduction produces more complexe transcript models
M1 M2 DACM3 reduction produces final transcript models Mi = final multi type multi molecule transcript model in which Exogean searches for a CDS 1 3 3 2 2 2 1 1 3 2 2 1
FN FN TP FN HAVANA TP FP FP Method_X Evaluation method
Evaluation method • TP = True Positive : each HAVANA CDS matched exactly by at least one CDS from method_X is counted as TP • FP = False Positive : a virtual HAVANA CDS is defined as a method_X CDS that does not match exactly a HAVANA CDS and is counted as FP • FN = False Negative : each HAVANA CDS that is not matched exactly by at least one method_X CDS is counted as FN
r1 r2 r3 h1 h2 h3 h4 h5 h6 h7 h8 h9 h10 h11 h12 h13 p4 h14 h16 h17 h18 DACM1 reduction produces level1 transcript models • ri = mRNA level1 • transcript model • pj= protein level1 • transcript model p1 p2 p3 h15
Building and reducing DACM2 = the Single Type Multi Molecule Clustering
R1 (r1,r3) 2 2 1 1 1 R2 (r2,r3) 2 1 1 1 P1 (p1,p3) 1 2 1 1 P2 (p2) 1 1 P3 (p4) 1 1 DACM2 reduction produces level2 transcript models • Ri = mRNA level2 • transcript model • Pj = protein level2 • transcript model
Building and reducing DACM3 = the Multi Type Multi Molecule Clustering