430 likes | 605 Views
Modeling Functional Genomics Datasets CVM8890-101. Lesson 1 13 June 2007 Bindu Nanduri. Lesson 1: Data to Biological sense. What we are trying to achieve. Introduction to functional genomics modeling strategies. Transcriptomics and Proteomics.
E N D
Modeling Functional Genomics DatasetsCVM8890-101 Lesson 1 13 June 2007 Bindu Nanduri
Lesson 1: Data to Biological sense. What we are trying to achieve. Introduction to functional genomics modeling strategies.
Why study gene expression changes?????Transcription is predominant form of regulation
Northern Blots Mol Vis. 1996 Nov 4;2:11
Microarrays Basic concept: Reverse Northern blot on a large scale High throughput: hybridize control and experimental samples simultaneously using distinct fluorescent dyes many assays can be carried out in parallel
AAAA.. Usually the most 3 prime area, often UTR 25mer (11 to 16) 25mer 25mer 25mer • Affymetrix oligo arrays design 25mer • http://www.affymetrix.com
Genomic Tiling Array Design 5´ 3´ Genome Sequence Multiple probes Center-Center Resolution 38 bp
Is mRNA level = Protein level? Is there a correlation??? Comparison of protein levels (MS, 2D gels) and RNA levels (SAGE) for 156 genes in yeast mRNA levelsunchanged, but protein levels varied by up to 20X protein levelsunchanged, but mRNA levels varied by up to 30X Highly expressed mRNAs correlate well with protein levels Gygi et al. (1999) Mol. Cell. Biol.
Expressed Sequence Tags ESTs…pieces of DNA sequence (usually 200 to 500nt) generated by sequencing either one or both ends of an expressed gene Bits of DNA that represent genes expressed in certain cells, tissues, or organs from different organisms and Can be useful "tags" to fish a gene out of a portion of chromosomal DNA by matching base pairs http://www.ncbi.nlm.nih.gov/About/primer/est.html
EST Sequence Clustering Gene can be expressed as mRNA many,many times, ESTs derived from this mRNA may be redundant many identical, or similar, copies of the same EST redundancy and overlap means that when someone searches dbEST for a particular EST, they may retrieve a long list of tags, many of which may represent the same gene UniGene database automatically partitions GenBank sequences into a non-redundant set of gene-oriented clusters http://www.ncbi.nlm.nih.gov/About/primer/est.html
ESTs: EST mapping to the genome, annotation differential expression Transcriptome: Clustering, differential expression analysis Proteome: differential expression analysis
Multiple data analysis platforms Proteomics LIST of elements Transcriptomics EST analysis
Modeling Function Modeling function requires: knowing the components of the system (structural annotation) knowing what these components do & how they interact (functional annotation)
Clustering Similar expression patterns = similar regulation? clustering algorithms help us identify patterns in complex data Key Goal: identify co-regulated groups of genes • Hierarchical clustering • K-means clustering • Self organizing feature maps • Principal component analysis
Proteomics Qualitative : total number of identified proteins data intersections Quantitative: changes in protein expression
Use GO for……. Grouping gene products by biological function Determining which classes of gene products are over-represented or under-represented Focusing on particular biological pathways and functions (hypothesis-driven data interrogation) Relating a protein’s location to its function
Course Overview Introduction to functional annotation. Orthologs and homologs; clusters of orthologous genes (COGs) and the gene ontology (GO); and how to find what functional annotation is available Tools for functional annotation. Accessing functional data; computational strategies to obtain more complete functional annotation; the AgBase GO annotation pipeline. Introduction to pathways analysis. Theory and strategies for pathway analysis modeling in different species and tools for pathway analysis. Functional genomics modeling : prokaryotic and eukaryotic examples
Some Useful Links • http://www.genomesonline.org/ (comprehensive access to information regarding complete and ongoing genome projects around the world.) • http://www.geneontology.org/ (provides a controlled vocabulary to describe gene and gene product attributes in any organism) • http://pir.georgetown.edu/ (integrated protein informatics resource for genomics and proteomics) • http://www.pir.uniprot.org/ (protein database) • http://mips.gsf.de/ (maintains a set of generic databases as well as the systematic comparative analysis of microbial, fungal, and plant genomes.) • http://www.ncbi.nlm.nih.gov/ (comprehensive resource for public databases, literature and tools) • http://www.ebi.ac.uk/ensembl/ (System that maintains automatic annotation of large eukaryotic genomes) • http://expasy.org/ (expert protein analysis system) • http://www.biocyc.org/ (BioCyc is a collection of 260 Pathway/Genome Databases: metabolic pathways) • http://www.genome.jp/kegg/ (biological systems" database integrating both molecular building block information and higher-level systemic information)
Some Useful Links • http://pfgrc.tigr.org/index.shtml (functional genomics studies on a variety of pathogens for which genomic sequence information is currently, or will soon be, available) • http://www.tigr.org/ (comprehensive resource for microbial genomics) • http://www.cs.ualberta.ca/~bioinfo/PA/ (High throughput proteome annotations) • http://garnet.arabidopsis.org.uk/systems_biology_tools.htm (Arabidopsis resources) • http://www.systems-biology.org/002/ (systems biology portal) • http://www.ebi.ac.uk/biomodels/ (mathematical models of biological interests) • http://www.genmapp.org/current_databases.html (species-specific collections of genes and annotation) • http://bioinfo.bgu.ac.il/bsu/microarrays/links/ (Microarray analysis resources) • http://david.abcc.ncifcrf.gov/ (Database for Annotation, Visualization and Integrated Discovery) • http://www.animalgenome.org/pigs/community/links.html (swine genetics community)
Some Useful Links • http://www.biocarta.com/FeaturedProducts/index.asp (pathways and tools for analysis) • http://www.genecards.org/index.shtml (database of human genes that includes automatically-mined genomic, proteomic and transcriptomic information, as well as orthologies, disease relationships, SNPs, gene expression, gene function, and service links for ordering assays and antibodies) • http://www.proteomecommons.org/ (proteomics tools) • http://harvester.embl.de/ • http://bioinformatics.org/ (open access institute) • http://www.ihop-net.org/UniPub/iHOP/ (A network of genes and proteins extends through the scientific literature) • http://www1.jcsg.org/psat/help/document.html (comparative analysis of protein sequence) • http://orthomcl.cbil.upenn.edu/cgi-bin/OrthoMclWeb.cgi (genome-scale algorithm for grouping ortholog protein sequences) • http://www.pathogenomics.ca/ortholuge/ (ortholog prediction program) • http://www.gene-regulation.com/pub/databases.html (transcription factor database)
Some Useful Links • http://www.reactome.org/ (curated knowledgebase of biological pathways) • http://www.biochemweb.org/systems.shtml(The Virtual Library of Biochemistry,Moleculer Biology and Cell Biology) • http://genome-www.stanford.edu/ (Stanford genomic resources) • http://www.softberry.com/berry.phtml (collection of tools for annotation and analysis of sequences) • http://sosui.proteome.bio.tuat.ac.jp/sosuiframe0E.html (prediction of transmembrane domains in proteins) • http://www.psort.org/psortb/ (subcellular localization predictions) • http://www.ch.embnet.org/software/TMPRED_form.html (prediction of membrane-spanning regions and their orientation) • http://www.agbase.msstate.edu/ (functional analysis of agricultural plant and animal gene products)