1 / 17

Exogean: an expert gene annotation framework based on directed acyclic coloured multigraphs

Exogean is a flexible method that automates human expert gene annotations by integrating heuristic rules and biological objects using directed acyclic multigraphs (DACMs). It generates highly specific gene models by building and reducing DACMs. This framework improves gene prediction and can be applied to any set of heuristic rules and resources.

olenh
Download Presentation

Exogean: an expert gene annotation framework based on directed acyclic coloured multigraphs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ENCODE Gene Prediction Workshop - EGASP/2005 Exogean: an expert gene annotation framework based on directed acyclic coloured multigraphs Sarah Djebali, Franck Delaplace, Hugues Roest Crollius

  2. Human experts generate high quality gene annotations • Human experts generate reference gene annotations • automating human expertise could provide highly specific • gene models • What do human experts do? • Human experts combine biological objects using heuristic rules • Both biological objects and heuristic rules evolve with time

  3. Exogean: a highly flexible method that automates human expertise • Exogean is a generic framework based ondirected acyclic • coloured multigraphs (DACMs)made to allow the integration • of any set of heuristic rules to any set of resources • In Exogean DACMs: • Nodes are biological objects (protein or mRNA alignments, …etc) • Multiple edges between nodes are relations between objects • In terms of DACMs the human expert annotation protocol • corresponds to building, reading and reducing DACMs

  4. CDS Identification Information Collection Filter Filter - Blat - Spidey - Blast … etc Exogean main steps Reduction Reduction Reduction DACM1 DACM2 DACM3 Single Type Multi Molecule Clustering Multi Type Multi Molecule Clustering Single Molecule Clustering output Final gene models with multiple transcripts Exogean core: DACM expert annotation Protein and mRNA alignments called HSPs

  5. h1 h2 h3 h4 h5 h6 h7 h8 h9 h10 h11 h12 h13 h14 h15 h16 h17 h18 Example: several mRNAs and proteins have been aligned to a specific locus • rmi = mRNA • molecule rm1 • pmj= protein • molecule rm2 • hk= mRNA or protein HSP rm3 pm1 pm2 pm3

  6. Building and reducing DACM1 = the Single Molecule Clustering

  7. mRNA, protein HSPs Level1 transcript models Level2 transcript models Level3 transcript models DACM2 building + reduction DACM3 building + reduction DACM1 building + reduction DACM expert annotation Each DACM reduction produces more complexe transcript models

  8. M1 M2 DACM3 reduction produces final transcript models Mi = final multi type multi molecule transcript model in which Exogean searches for a CDS 1 3 3 2 2 2 1 1 3 2 2 1

  9. FN FN TP FN HAVANA TP FP FP Method_X Evaluation method

  10. Specificity on the 44 ENCODE regions

  11. Sensitivity on the 44 ENCODE regions

  12. Evaluation method • TP = True Positive : each HAVANA CDS matched exactly by at least one CDS from method_X is counted as TP • FP = False Positive : a virtual HAVANA CDS is defined as a method_X CDS that does not match exactly a HAVANA CDS and is counted as FP • FN = False Negative : each HAVANA CDS that is not matched exactly by at least one method_X CDS is counted as FN

  13. r1 r2 r3 h1 h2 h3 h4 h5 h6 h7 h8 h9 h10 h11 h12 h13 p4 h14 h16 h17 h18 DACM1 reduction produces level1 transcript models • ri = mRNA level1 • transcript model • pj= protein level1 • transcript model p1 p2 p3 h15

  14. Building and reducing DACM2 = the Single Type Multi Molecule Clustering

  15. R1 (r1,r3) 2 2 1 1 1 R2 (r2,r3) 2 1 1 1 P1 (p1,p3) 1 2 1 1 P2 (p2) 1 1 P3 (p4) 1 1 DACM2 reduction produces level2 transcript models • Ri = mRNA level2 • transcript model • Pj = protein level2 • transcript model

  16. Building and reducing DACM3 = the Multi Type Multi Molecule Clustering

More Related