310 likes | 410 Views
BiGCaT Bioinformatics. Hunting strategy of the bigcat. BiGCaT, bridge between two universities. TU/e Ideas & Experience in Data Handling. Universiteit Maastricht Patients, Experiments, Arrays and Loads of Data. BiGCaT. Major Research Fields. Nutritional & Environmental Research.
E N D
BiGCaT Bioinformatics Hunting strategy of the bigcat
BiGCaT,bridge between two universities TU/eIdeas & Experience in Data Handling Universiteit Maastricht Patients, Experiments,Arrays and Loads of Data BiGCaT
Major Research Fields Nutritional &EnvironmentalResearch CardiovascularResearch BiGCaT
What are we looking for? Different conditions show different levels of gene expression for specific genes
Differences in gene expression? • Between e.g.: • healthy and sick • different stages of disease progression • different stages of healing • failed and successful treatment • more and less vulnerable individuals • Shows: • important pathways and receptors • which then can be influenced
The transfer of informationfrom DNA to protein. From: Alberts et al. Molecular Biology of the Cell, 3rd edn.
Gene expression measurement DNA mRNA protein Functional genomics/transcriptomics: Changes in mRNA • Gene expression microarrays • Suppression subtraction lybraries Proteomics: Changes in protein levels • 2D gel electrophoresis • Antibody arrays
Gene expression arrays Macroarrays: absolute radioactive signal. Validation. Microarrays: relative fluorescense signals. Identification.
Layout of a microarray experiment • Get the cells • Isolate RNA • Make fluorescent cDNA • Hybridize • Laser read out • Analyze image
The cat and its prey:the data Comprises: • Known cDNA sequences (not known genes!)on the array = reporters • Data sets typically contain 20,000 image spot intensity values in 2 colors • One experiment often contains multiple data points for every reporter (e.g. times or treatments) • Each datapoint can (should) consist of multiple arrays Bioinformatics should translate this in to useful biological information
Hunting Comprises: • Analyze reporters • Data pretreatment • Finding patterns in expression • Evaluate biological significance of those patterns
Reporter analysis • Reporter sequence must be known(can be sequenced using digest electrophoresis). • Lookup sequence in genome databases (e.g. Genbank/Embl or Swissprot) • Will often find other RNA experiments (ESTs) or just chromosome location.
Blast reporters against what? • Nucleotide databases (EMBL/Genbank)Disadvantages: many hits, best hit on clone, we actually want function (ie protein) • Nucleotide clusters (Unigene)Disadvantage: still no function • Protein databases (Swissprot+trEMBL)Disadvantages: non coding sequence not found, frameshifts in clones
Two implemented solutions • Start with Unigene (from Blastn or platform provider), mine using SRS (direct, through PDB, through PIR) -> Swissprot/trEMBL • Use dedicated EMBL-Swissprot X-linked DB (Blast against EMBL subset get Swissprot/trEMBL)
Two implemented solutions • Start with Unigene (from Blastn or platform provider), mine using SRS (direct, through PDB, through PIR) -> Swissprot/trEMBL • Use dedicated EMBL-Swissprot X-linked DB (Blast against EMBL subset get Swissprot/trEMBL)
Scotland - Holland: 1-0? Check Affymetrix reporter sequences. • Each reporter 16 25-mer probes. • Blast against ENSEMBL genes(takes 1 month on UK grid). • Use for cross-species analysis • Adapt RMA statistical analysis in Bioconductor
Next slide shows data of one single actual microarray • Normalized expression shown for both channels. • Each reporter is shown with a single dot. • Red dots are controls • Note the GEM barcode (QC) • Note the slight error in linear normalization (low expressed genes are higher in Cy5 channel)
Next slide shows same data after processing • Controls removed • Bad spots (<40% average area) removed • Low signals (<2.5 Signal/Background) removed • All reporters with <1.7 fold change removed (only changing spots shown)
Final slide shows information for one single reporter • This signifies one single spot • It is a known gene:an UDP glucuronyltransferase • Raw data and fold change are shown
Secondary Analyses • Gene clustering(find genes that behave equally) • Cluster evaluation(what do we see in clusters …) • Physiological evaluation(for arrays, proteomics, clusters) • Understand the regulation
Expr. level T2 signal 2 T1 signal time Clustering: find genes with same pattern Left hand picture shows expression patterns for 2 genes (these should probably end up in the same cluster). Right hand picture shows the expression vector for one gene for the first 2 dimensions. Can be normalized by amplitude (circle) or relatively (square).
Cluster evaluation • Group genes (function, pathway, regulations etc.) • Find groups in patterns using visualization tools and automatic detection. • Should lead to results like:“This experiment shows that a large number of apoptosis genes are up-regulated during the early stage after treatment. Probably meaning that cells are dying”
Example of GenMAPP results: Manual lookup on a MAPP
Understanding regulation The main idea: co-regulated genes could have common regulatory pathways. The basic approach: annotate transcription factor binding sites using Transfac and use for supervised clustering. The problem: each gene has hundreds of tfb’s. Solution? Use syntenic regions using rVista (work in progress with Rick Dixon)
Understanding QTL’s Get blood pressure QTLs:from ENSEMBL/cfg Welcome group. Look up functional pathways and Go annotations using GenMapp: virtual experiment assume all genes in QTL are changing. Create a new blood pressure Mapp: confront this with real blood pressure/heart failure microarray data. Work in progress TU/e MDP3 group.
People involved Bigcat Maastricht: Rachel van Haaften (IOP), Edwin ter Voert (BMT), Joris Korbeeck (BMT/UM), Willem Ligtenberg (IOP), Stan Gaj (tUL), Chris Evelo Tue: Peter Hilbers, Huub ten Eijkelder, Patrick van Brakel, lots of students CARIM: Yigal Pinto, Umesh Sharma, Blanche Schroen, Matthijs Blankesteijn, Jos Smits, Jo de Mey, Danielle Curfs, Kitty Cleutjens, Natasja Kisters, Esther Lutgens, Birgit Faber, Petra Eurlings, Ann-Pascalle Bijnens, Mat Daemen, Frank Stassen, Marc van Bilssen, Marten Hoffker. NUTRIM: Wim Saris, Freddy Troost, Johan Renes, Simone van Breda.GROW: Daisy vd Schaft, Chamindie PuyandeeraIOP Nutrigenomics: Milka Sokolovic, Theo Hackvoort, Meike Bunger, Guido Hooiveld, Michael Müller, Lisa Gilhuis-Pedersen, Antoine van Kampen, Edwin Mariman, Wout Lamers, Nicole Franssen, Jaap keijer Cfg Welcome group: Neil Hanlon (Glasgow) Gontran Zepeda (Edinburg), Rick Dixon (Leicester), Sheetal Patel (London). Paris leptin group: Soraya Taleb, Rafaelle Cancello,Nathalie Courtin, Carine ClementOrganon: Jan Klomp, Rene van Schaik. BioAsp: Marc Laarhoven.