1 / 56

Carlo Colantuoni & Rafael Irizarry April 19, 2006 ccolantu@jhsph

Gene Annotation in Genomics Experiments With a Focus on Tools in BioConductor. Carlo Colantuoni & Rafael Irizarry April 19, 2006 ccolantu@jhsph.edu. Biological Setup. Every cell in the human body contains the entire human genome: 3.3 Gb or ~30K genes.

paul2
Download Presentation

Carlo Colantuoni & Rafael Irizarry April 19, 2006 ccolantu@jhsph

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Gene Annotation in Genomics Experiments With a Focus on Tools in BioConductor Carlo Colantuoni & Rafael Irizarry April 19, 2006 ccolantu@jhsph.edu

  2. Biological Setup Every cell in the human body contains the entire human genome: 3.3 Gb or ~30K genes. The investigation of gene expression is meaningful because different cells, in different environments, doing different jobs express different genes. Tasks necessary for gene expression analysis: Define what a gene is. Identify genes in a sea of genomic DNA where <3% of DNA is contained in genes. Design and implement probes that will effectively assay expression of ALL (most? many?) genes simultaneously. Cross-reference these probes.

  3. Cellular Biology, Gene Expression, and Microarray Analysis DNA RNA Protein

  4. Gene: Protein coding unit of genomic DNA with an mRNA intermediate. ~30K genes Sequence is a Necessity DNA Probe protein coding START STOP mRNA AAAAA 5’ UTR 3’ UTR Genomic DNA 3.3 Gb

  5. Protein-coding genes are not easy to find - gene density is low, and exons are interrupted by introns. From Genomic DNA to mRNA Transcripts EXONS INTRONS ~30K >30K Alternative splicing Alternative start & stop sites in same RNA molecule RNA editing & SNPs Transcript coverage Homology to other transcripts Hybridization dynamics 3’ bias

  6. Unsurpassed as source of expressed sequence Sequence Quality! Redundancy! Completeness? Chaos?!?

  7. From Genomic DNA to mRNA Transcripts ~30K >30K >>30K

  8. Transcript-Based Gene-Centered Information

  9. Design of Gene Expression Probes Content: UniGene, Incyte, Celera Expressed vs. Genomic Source: cDNA libraries, clone collections, oligos Cross-referencing of array probes (across platforms): Sequence <> GenBank <> UniGene <> HomoloGene Possible mis-referencing: Genomic GenBank Acc.#’s Referenced ID has more NT’s than probe Old DB builds DB or table errors – copying and pasting 30K rows in excel … Using RefSeq’s can help.

  10. From Genomic DNA to mRNA Transcripts

  11. From Genomic DNA to mRNA Transcripts

  12. http://www.ncbi.nlm.nih.gov/Entrez/

  13. Functional Annotation of Lists of Genes KEGG PFAM SWISS-PROT GO DRAGON DAVID BioConductor

  14. Analysis of Functional Gene Groups

  15. One of the largest challenges in analyzing genomic data is associating the experimental data with the available metadata, e.g. sequence, gene annotation, chromosomal maps, literature. • The annotate and AnnBuilder packages provides some tools for carrying this out. • Using AnnBuilder. It is possible to build associations with specific gene lists, eg. hgu95apackage for Affymetrix HGU95A GeneChips. • The annotate package maps to GenBank accession number, LocusLink LocusID, gene symbol, gene name, UniGene cluster, chromosome, cytoband, physical distance (bp), orientation, Gene Ontology Consortium (GO), PubMed PMID.

  16. Analysis of Functional Gene Groups

  17. Functional Gene/Protein Networks DIP BIND MINT HPRD PubGene Predicted Protein Interactions

  18. Analysis of Gene Networks

  19. 9606 is the Taxonomy ID for Homo Sapiens

  20. Predicted Human Protein Interactions

  21. Predicted Human Protein Interactions Used high-throughput protein interaction experiments from fly, worm, and yeast to predict human protein interactions. Human protein interaction is predicted if both proteins in an interaction pair from other organism have high sequence homology to human proteins. >70K Hs interactions predicted >6K Hs genes

  22. Analysis of Gene Networks

  23. Thanks to … Rafael Irizarry Scott Zeger Jonathan Pevsner Carlo Colantuoni Clinical Brain Disorders Branch, NIMH, NIH Dept. Biostatistics, JHSPH ccolantu@jhsph.edu colantuc@intra.nimh.nih.gov

  24. NCBI Web Links http://www.ncbi.nlm.nih.gov http://www.ncbi.nlm.nih.gov/Entrez/ http://www.ncbi.nih.gov/Genbank/ http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=Nucleotide http://www.ncbi.nlm.nih.gov/dbEST/ http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=Protein http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=gene http://www.ncbi.nlm.nih.gov/LocusLink/ http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=unigene http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=homologene http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=OMIM http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed http://www.ncbi.nlm.nih.gov/PubMed/ http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=cdd http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=snp http://www.ncbi.nlm.nih.gov/SNP/ http://eutils.ncbi.nlm.nih.gov/entrez/query/static/advancedentrez.html http://www.ncbi.nlm.nih.gov/geo/ http://www.ncbi.nlm.nih.gov/RefSeq/ FTP: ftp://ftp.ncbi.nlm.nih.gov/ ftp://ftp.ncbi.nlm.nih.gov/repository/UniGene ftp://ftp.ncbi.nih.gov/pub/HomoloGene/

  25. PROTEIN: http://us.expasy.org/ ftp://us.expasy.org/ http://www.sanger.ac.uk/Software/Pfam/ http://www.sanger.ac.uk/Software/Pfam/ftp.shtml http://smart.embl-heidelberg.de/ http://www.ebi.ac.uk/interpro/ http://us.expasy.org/prosite/ ftp://us.expasy.org/databases/prosite/ NUCLEOTIDE: http://genome.ucsc.edu/ http://www.embl-heidelberg.de/ http://www.ensembl.org/ http://www.ebi.ac.uk/ http://www.gdb.org/ http://bioinfo.weizmann.ac.il/cards/index.html http://www.gene.ucl.ac.uk/cgi-bin/nomenclature/searchgenes.pl PATHWAYS and NETWORKS: http://www.genome.ad.jp/kegg/ ftp://ftp.genome.ad.jp/pub/kegg/ (http://www.genome.ad.jp/anonftp/) http://dip.doe-mbi.ucla.edu http://dip.doe-mbi.ucla.edu/dip/Download.cgi http://www.blueprint.org/bind/ http://www.blueprint.org/bind/bind_downloads.html http://160.80.34.4/mint/index.php http://160.80.34.4/mint/release/main.php http://www.hprd.org/ http://www.hprd.org/FAQ?selectedtab=DOWNLOAD+REQUESTS http://www.pubgene.org/ (also .com) More Web Links http://www.bioconductor.org/ http://apps1.niaid.nih.gov/david/ http://www.geneontology.org/ http://discover.nci.nih.gov/gominer/index.jsp http://pubmatrix.grc.nia.nih.gov/ http://pevsnerlab.kennedykrieger.org/dragon.htm

  26. SAVAGE: Detection of More Subtle Functionally Related Groups of Gene Expression Changes

  27. KEGG EXP#1 PFAM Swiss-Prot ~40K annotations Differential Expression of Functional Gene Groups within One Experiment DRAGON SAVAGE 30K 10K ~3K

  28. BioDB EXP#1 EXP#4 EXP#3 EXP#2 Differential Expression of a Single Functional Gene Group Across Multiple Experiments DRAGON SAVAGE

  29. 3 X 2 1 EXP#1 EXP#1 5 4 4 3 EXP#2 EXP#2 5 X 1 2 Similar Differential Expression Patterns Across Multiple Experiments p value <0.1 0.0 CN CN CN ALL CN CN The distribution of gene expression values for each gene group in each sample is plotted as a single point in low dimensional space. This is achieved using Principal Components Analysis along with Non-Metric Multi-Dimensional Scaling.

  30. PING: Detection of Differential Expression in Functional Networks of Proteins

  31. Interaction Networks in Gene Expression Data

  32. Large Protein Interaction Network Network Regulated in Sample #1

  33. Large Protein Interaction Network Network Regulated in Sample #2 Network Regulated in Sample #1

  34. Large Protein Interaction Network Network Regulated in Sample #3 Network Regulated in Sample #1 Network Regulated in Sample #2

  35. Large Protein Interaction Network Network of Interest PING Network Regulated in Sample #1 Network Regulated in Sample #2 Network Regulated in Sample #3

More Related