280 likes | 416 Views
ACE & RACE a nnotation of c omplex/ c ombinatorial e xpressions. Self-introduction. Andrey Zinovyev. M.Sc. in theoretical physics (1997). Programming, industrial information systems (C++, Delphi). Ph.D. in computer science (2001), Method of elastic maps and applications in bioinformatics.
E N D
Self-introduction Andrey Zinovyev M.Sc. in theoretical physics (1997) Programming, industrial information systems (C++, Delphi) Ph.D. in computer science (2001), Method of elastic maps and applications in bioinformatics Web-services development (Java, JSP) Senior postdoctoral fellow in IHES, France http://www.ihes.fr/~zinovyev or type “zinovyev” in Google
Plan of the talk ACE framework introduction what we have What will be in RACE? ACE software C++ code web-application Plans for ACE and RACE Computational environment
Gene annotation TF1 Probability profiles b.ace TF2 RNA structures r.ace m.ace Microarrays Genome as databaseeverything is annotation Genomes: human, chimp, mouse, rat ATGCGTGCAAATGCTCTTTGTGTAACGTGTCGACGTACGTGTGTAACGTGCGACGTACGT common format for annotation files (binary p-files)
Genome preprocessingcompile once, run everywhere ATGCGTGCAAATGCTCTTTGTGTAACGTGTCGACGTACGTGTGTAACGTGCGACGTACGT ace.annotate ace.RNAtools ace.annotate arc ace.map r.ace Potential RNA structures, splicing sites b.ace Potential TF binding sites m.ace Gene expression data c.ace Chromatin structure and dynamics ace.enhance ace.cluster ace.display ace.dyCr ace.stat
Structure space Structure spacethe truth is out there set of annotations Multidimensional combinatorial space of all possible structures appearing in a scanning window
Method_01 ace.enhance expression (heuristic mask) Method_02 … Method_11 ace.enhance annotation ace.enhancebe more abstract Accessing and masking structure space view in genome browser (ace.display) • compare with experiment (cross-annotation) (ace.dyCr) • construct more abstract space and apply ace.enhance further
TF1 TF2 b.ace Transfac release Genome release ace.annotate b.ace ~1.2Tbyte
ace.enhance • Enhance methods: • Fixed spacing of sites • Fixed order of sites • Fixed strand orientation of sites • Multiple copies of site • Minimal spacing of sites • Maximal spacing of sites • Variable, defined spacing between sites • Minimal p-value for weight matrix • Maximal p-value for weight matrix • Bias weight-matrix M1&&M2||M3||M4||M5 … + ace.cluster: simplified version of enhance for detecting clusters of repetitions of one motif
rarHS – 659.631 hits cMyb – 1.647.505 hits CEBP – 1.189.196 hits PU.1 – 472.383 hits ace.annotate => ace.enhance expression, window 50bp: PU.1 &&rarHS — rarHS || rarHS — rarHS &&CEBP<cMyb 8 ** 11 ** Result: 102 hits 14.1 5’ 3’ 5’ 3’ 14.2 5’ 3’ 14.3 Example14 transcription factors, chr14 of UCSC_HG15
jfl_im = TAGAGA TAGAGT TAGGGA TAGGGT 183.389 hits ace.annotate => Example2clusters of motifs, chr14 ace.enhance expression, window 300bp: jfl_im 10 copies Result: 51 hits in 5 groups
ACE C++ tools aceLib, wraps system-dependent code generic programming for code reusability ace.annotate – probability based annotations and motifs search ace.enhance – accessing (masking) structure space: combinatorial query language ace.cluster – extracting clusters of repetitions: simplified version of enhance ace.dyCr – first step in structure space analysis: dynamic cross-annotation ace.stat – statistical significance analysis
false-positive rate Plans with ACEprincipal problem ace.stat : statistical model of random noise maximum entropy principle significance analysis
Plans with ACEvisualizing structure space creating 2D maps of structure space data visualization, dimension reduction
ace.eva ace.net Plans with ACEintegrating m.ace ace.map m.ace
silencing structures in space Plans with ACEmodel of chromatin structure and dynamics chromatin state profiles arc c.ace imunoprecipitation experiments
Plans with ACEcomparative genomics genome1 genome2
Installation of b.ace in Lillehttp://ace.ibl.fr 1.2 Tbyte PowerVault storage PowerEdge Dell server
Installation of RACE in Sherbrooke (golf) LISA DB UCSC local ace UCSC browser r.ace DB G browser
new genome release where? Distributed environmentdatabase synchronization protocol b.ace Lille France LISA Sherbrooke Canada r.ace Sherbrooke Canada m.ace INSERM Paris c.ace IHES Paris public dbs
ace.display ace.stat ace.dyCr ace.enhance pluggable methods RACEplatform for integration ace.annotate find simple motifs (loops, hairpins) ace.RNAtools pluggable algorithms p-files (r.ace database)
ACE team ace team leader : Arndt Benecke, IHES ace.uit, ace C++: Andrey Zinovyev, IHES aceLib, ace C++: Thomas Bücher, Inst.Neur. ace.map : Sebastian Noth, INSERM ace.stat : Richard Madden, UdSh arc : Graham Smith, IHES