520 likes | 622 Views
Expression Analysis Platforms. Friday's Class. 4:00-5:00 140 SH Work in the Laboratory of Signal Processing of the EESC / USP revolves around modeling biological systems. Currently, our lines of research are: diagnosing speech pathology, ultrasound signal processing, and
E N D
Friday's Class 4:00-5:00 140 SH Work in the Laboratory of Signal Processing of the EESC / USP revolves around modeling biological systems. Currently, our lines of research are: diagnosing speech pathology, ultrasound signal processing, and bioinformatics (particularly for phylogeny). All lines of research involve clustering algorithms. The work on clustering seeks to determine the natural number of groups and to validate the clustering algorithms. Several techniques have been applied to genomic databases, among them are: resampling, analysis of missing data, using assumptions about a priori information. Now we are focusing on probabilistic models and validation of structural models. The studies are conducted on information generated by electrophoresis of species with agricultural applications, and are provided by Embrapa (www.embrapa.br). Now we are working with 3 doctoral students focusing on the area of binder phenotypic and genotypic information for varieties of corn.
Other Courses • Intro to Informatics (in CS) • Intro to Bioinformatics (51:121) • provides a first exposure to some available computational techniques and resources • however, the emphasis is on utilization • In this course (51:123) -- I try to emphasize tools and techniques that you would use to go about developing your own computational resources (software, systems, tools, etc). • Computational Methods in Molecular Biology (51:122 -- Casavant, Bair) • advanced topics
Bioinformatics Certificate • Offered by the Graduate College (MS/PhD) • http://informatics.grad.uiowa.edu/bioinformatics/
Final Exam • 25 questions – mostly short answer/T/F • 1 paper • 1 genome sequencing • 2 Ensembl • 1 references • 1 array • 2 programming • 1 pattern matching • 2 expression • 3 other • 3 p-genes • 1 Blast/Blat • 5 Hash questions • 1 N-W • 1 sequencing
Outline • What is expression • Platforms • ChIP on Chip • Gene expression • Exon arrays • Tiling arrays • SNP chips • Applied S/W for Expression "Library" -- OTDB • Alternative Splicing • Association Study Example -- AMD • How to analyze
What is expression? Gene expression mRNA - transcription - microRNAs Protein - translation
A Typical Experiment Case vs. Control Ex. Retina cells +/- 7-keto-cholesterol 3x redundancy Look for differentially expressed genes t-test, ANOVA fold-change Result --> set of genes
But there’s so much more… • Differential expression of genes • Time-courses • Alternative splicing • ChIP-on-chip • High-density SNP genotyping • Using chips to select genomic fragments for re-sequencing • Additional annotation/analyses
Definition of Microarray • What is a gene expression array? • “A microarray is a small analytical device that allows genomic exploration with speed and precision unprecedented in the history of biology” - Schena 2003
Source: www.bioteach.ubc.ca/MolecularBiology/microarray/ graphics by Jiang Long
Advantages to Arrays • A single array permits monitoring thousands of genes in parallel • Provides information at genomic scale • Reveals gene function and gene interactions • Identifies relationship between genetic and biochemical pathways • Identifies traits associated with multigenic origins • Caveat - further modifications may occur • Post-transcriptional • Translational • Protein
Microarray Research • Ubiquitous in biology & agriculture research • Interdisciplinary disciplines • Biology • Computer Science • Statistics • Experiments require teams of individuals • Analysis presents many obstacles that need to be overcome
Statement of the Problem • Obstacles impeding analysis process • Analysis is complex with multiple steps • Requires multiple discipline expertise • Bio - understand underlying biology • Stats - normalization & statistical measures • Comp Sci - programmatic solutions, computation resources • Necessity for centralized analysis system • Robust • Extensible • Portable
Platforms Gene Expression Arrays Exon Arrays Tiling Arrays SNP chips Venders: Affymetrix, Nimblegene, Agilent, others
An Aside • State-of-the-art sequencing technology + microarray == ? • 454-, pyro-, pyrophosphate sequencing
Gene Chip + Sequencing 454, pyro- or pyrophosphate sequencing Genome sequencing in microfabricated high-density picolitre reactors, Margulies, et. al, Nature, 2005 Nature 2007
Sequence Capture http://www.nimblegen.com/products/seqcap/index.html
Gene Expression Arrays Traditional method, typically provides one or more probes that interrogate the expression level of a gene. U133Plus2 - 54,000 probes
Exon Arrays Target each exon of a gene individually 1,400,000 probe sets Different levels of confidence/quality 300,000 exons from full-length mRNAs 880,000+ exons from gene predictions 500,000+ “control” exons Available for human, mouse and rat
Tiling Array http://www.affymetrix.com/products/arrays/specific/human_tiling.affx
Tiling Arrays Covering the entire genome with probes Probes every 35 bp across the genome 7-14 chips (depending on the application) … or can focus on a specific area 10,000 bp proximal promoter of every gene 1 chip
Tiling Arrays - Applications Applications expression protein-DNA interaction DNA modifications methylation acetylation Anywhere in the genome!
ENCODE Project Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project Nature, V 447, June 14, 2007
Transcript Connectivity • protein-coding loci are more transcriptionally complex than previously thought • 19% of pseudogenes transcribed • genes had, on average, 10 different transcriptional start sites
ChIP on Chip http://www.chem.agilent.com/Scripts/generic.asp?lpage=37461&indcol=N&prodcol=N
SNP chips SNPs - single nucleotide polymorphisms Affymetrix 6.0 Array • 906,600 SNPs • 946,000 (non-polymorphic) "monomorphic" SNPs Applications: Linkage Association Studies Changes in Copy Number (deletions/duplications)
Association populations Unaffected Affected allele frequencies A1 A2 A1 A2 SNP 1 0.74 0.26 0.75 0.25 SNP 2 0.70 0.30 0.10 0.90 Power increases with more samples, and more SNPs
Alternative Splicing in the Eye GOAL: To identify the splicing variants expressed in retina, retinal pigment epithelia, and optic nerve head. (3x biological replicates) Motivation: To guide/focus screening efforts to those exons that are expressed. In collaboration with Rob Mullins
Ocular Tissue Expression Database Survey of 10 ocular tissues GOAL: catalog which genes are expressed across tissues of specific interest in ocular In collaboration with Abe Clark at Alcon
AMD Association Study GOAL: Identify the major susceptibility regions for age-related macular degeneration. Several regions have been reported • How may susceptibility regions are there? Genotyping 400 AMD patients and controls with high-density SNP chips 400,000,000 genotypes
Association populations Unaffected Affected allele frequencies A1 A2 A1 A2 SNP 1 0.74 0.26 0.75 0.25 SNP 2 0.70 0.30 0.10 0.90 Power increases with more samples, and more SNPs
How to analyze the data? First step is acquiring the data! Normalization Analysis