410 likes | 584 Views
CEINGE. G. Paolella. DBBM CESMG. University Campus. INTERNET. CSI. CEINGE. Research and Services in Bioinformatics. CAPRI. Image restoration and analysis. Comparative Genomics. Francesco Salvatore. 0503. Research subjects. - Comparative genomics - DG-CST - KinWeb
E N D
CEINGE G. Paolella DBBM CESMG
University Campus INTERNET CSI CEINGE
Research and Services in Bioinformatics CAPRI Image restoration and analysis Comparative Genomics Francesco Salvatore 0503
Research subjects • - Comparative genomics • - DG-CST • - KinWeb • Non Coding RNAs • Bacterial • Eukaryotic • - Cell motility
KinWeb DB (d) (a) (b) (c) (e)
Three genes I II III a b c // // CST Ig-I Ig-II Ig-III TM Tyr Kinase // // // CSTs Ser-Thr Kinase // // Ser-Thr Kinase CST a) b) c)
Pipeline Selection of homologous chromosome regions from human and mouse genomes. Masking sequences of repetitive elements to reduce the noise fatally introduced by repeated sequences through RepeatMasker. Comparison of selected regions using BLASTZ, a program based on a local similarity algorhitm. Selection of the definitive set of CSTs based on specified thresholds (identity >= 70%; length >= 100 bp) using StrongHits . Insertion of selected CSTs into DB and extensively annotation for: - type (i.e. intergenic, exonic etc.) according to Ensembl - Coding capability according to Ensembl - Distances from other genes and coding regions - Calculation of Log Score according to UCSC comparison of human and mouse genomes Further analysis on the dataset looking for subpopulations sharing specific characteristics, using different programs, such as: - Blast of CSTs vs EST, human and other species genomes - Program for calculation of CPS score (Coding Potential Score) - RNA structure prediction programs
Non coding RNAs Imprinting H19, AIR X inactivation XIST Chromatin structure dynamics small RNAs DNA demethylation KHPS1a ncRNA DNA transcription/maturation snoRNA Self-splicing intron snRNA tRNA rRNA Antisense miRNA reverse transcription transcription mRNA Proteins maturation translation
Position in the genome Position
PFOLD Secondary structures RNAz P = 0.99
Cluster 4x14x2=112 procs 2.8 GHz 4x14x2=112 GB RAM 2 GB/s per scheda - 4 GB/s aggregata
Servizi bioinformatici per la ricerca gia’ attivi • Circa 100 banche dati di interesse biologico accessibili mediante SRS (sequenze nucleotidiche, genomi, mutazioni, malattie ereditarie, enzimi, etc.) • Sistema integrato per analisi di dati biologici con oltre 150 programmi per analisi di sequenze, modelli evolutivi, studio di mutazioni, proteine etc. • Banche dati realizzate nell’ambito di progetti di ricerca (DG-CST, KinWEB, etc.) • Sistemi per la gestione di dati sperimentali (campioni biologici, sequenze, immagini da microscopia etc.) Francesco Salvatore 0503
Research and Services in Bioinformatics CAPRI Research and services Image restoration and analysis Comparative Genomics
Servizi: chi ha accesso ? • CEINGE • DBBM • IIGB • BIOGEM • Facolta’ di Medicina • Facolta’ di Biotecnologie • Altre Facolta’ • Pubblico (accesso limitato) Francesco Salvatore 0503
Services organization WEB SERVER CAPRI SRS ENSEMBL PISE Other Fasta Emboss Blast User Data Primary remote databases DB
CAPRI CAPRI
CAPRI workflow DNA Complement Isoelectric point Translation Various operations in a row:Complement ->Translation -> Isoelectric point of the resulting protein.
CAPRI architecture Legenda Relazione tra oggetti: Uso Eredità Esecuzione programmi Trasferimento dati Relazione temporale CLIENT SERVER CAPRI CGI Menu Table Plugin Object Pise Program Object Plugin Object CLI Simple Programs Base Obj. Program Object Plugin Object CURL Program Object Tasks Obj. Plugin Object SOAP Phylip ClustalW Plugin Object JEMBOSS Disk Buffering Genscan HMMer EMBOSS FASTA Programmi BLAST Dischi del Server
Distributed execution Access Server Access Server Access Server For each user request, a process is launched on a different node C l u s t e r Cluster Nodes
Cluster Cluster Manager DB server Cluster activity Relational DB Broker 3 – Request the status of the cluster 4 – Search for the best resource and return the corresponding node IP 2 – Request a node IP Web application server 1 – Run a command 5 - launch the command on the node 6 – Return the result http
Broker DB virtual node virtual node node node node node node node DB Grid node node node node node node node node
Image archival and management PROGETTO DI RICERCA -------------- -------------- -------------- *Cell line *Colture conditions *Fixation and inclusion methods, stainings, ecc *Project title *Experiment name, *Author, group, group leader, ecc. *Objective *Focus Position *Stage position x/y *Exposure time *Resolution, ecc. WEB INTERFACE DB
IPROC timelapse at 6 positions timelapse actin wound healing timelapse 2 adhesion actin staining
IPROC architecture data + images proc- steps page iPage HPC on Cluster nodes G a t e w a y area iPane iPane iPane image
Distributed execution of parallel requests Access Server Access Server Access Server A tool can require the execution of multiple, simultaneous processes C l u s t e r Cluster Nodes
What software may be linked -PHP internal routines (basic drawing, processing) -ImageMagick (more advanced processing) -Image converters -Special tools (PDL, deconvolution) -Tools developed in-house (cell tracking) - ......
Advantages • -Convenient graphic interface • -Access to a vast library of image processing steps • No specific interface requirements • Remote processing on parallel hardware • Support for a large number of concurrent users • System independent (works on Mac, PC, Linux etc.) • No need to install. A browser is enough.