440 likes | 557 Views
DNA, Gene, and Genome. Translating Machinery for Genetic Information. Transcription factors. mRNA levels . Automated DNA Sequencing. Data Increase (from NCBI web site). Partial Display of Human Draft Sequence (Nature, 2001). Human Genome Map at NCBI.
E N D
Transcription factors mRNA levels
60-70 KDa Protein interacting with prostate cancer suppressor MGALRPTLLPPSLPLLLLLMLGMGCWAREVLVPEGPLYRVAGTAVSISCNVTGYEGPAQQNFEWFLYRPEAPDTALGIVSTKDTQFSYAVFKSRVVAGEVQVQRLQGDAVVLKIARLQAQDQGIYECTPSTDTRYLGSYSGKVELRVLPDVLQVSAAPPGPRGRQAPTSPPRMTVHEGQELALGCLARTSTQKHTHLAVSFGRSVPEAPVGRSTLQEVVGIRSDLAVEAGAPYAERLAAGELRLGKEGTDRYRMVVGGAQAGDAGTYHCTAAEWIQDPDGSWAQIAEKRAVLAHVDVQTLSSQLAVTVGPGERRIGPGEPLELLCNVSGALPPAGRHAAYSVGWEMAPAGAPGPGRLVAQLDTEGVGSLGPGYEGRHIAMEKVASRTYRLRLEAARPGDAGTYRCLAKAYVRGSGTRLREAASARSRPLPVHVREEGVVLEAVAWLAGGTVYRGETASLLCNISVRGGPPGLRLAASWWVERPEDGELSSVPAQLVGGVGQDGVAELGVRPGGGPVSVELVGPRSHRLRLHSLGPEDEGVYHCAPSAWVQHADYSWYQAGSARSGPVTVYPYMHALDTLFVPLLVGTGVALVTGATVLGTITCCFMKRLRKR
Molecular biology databases • Sequence databases • Annotated • Low-annotation • Specialized • Structural databases • Motif databases • Genome databases • Proteome databases • RNA expression • Literature • Populations • Mutations • Polymorphisms • Organisms • Pathways
Mutations/polymorphisms Promoters ESTs Genome maps Tissues and cells DNA motifs DNA sequences RNA expression Molecular Phylogeny Substrates Transcription Factors Protein sequences Metabolic pathways Protein structures Protein motifs Gene Family
Databases formats • Relational databases • GDB, GSDB, MGD etc. • Vender: Sybase, Oracle etc. • Flat file databases • GenBank, SWISS-PROT etc. • Object-oriented databases • ACeDB, AtDB etc.
Molecular biology data types Mouse chromosome X from the Mouse Genome Informatics project http://www.informatics.jax.org/ Organisms Genome maps
Molecular biology data types Organisms Genome maps DNA sequences RNA sequences ...AATGGTACCGATGACCTGGAGCTTGGTTCGA...
Molecular biology data types Organisms Genome maps DNA sequences RNA sequences Protein sequences ...TRLRPLLALLALWPPPPARAFVNQHLCGSHLVEA...
Molecular biology data types Organisms Genome maps DNA sequences RNA structures RNA sequences Protein sequences Protein structures PDB entry 1CIS P.Osmark, P.Sorensen, F.M.Poulsen
Molecular biology data types Organisms Genome maps DNA motifs DNA sequences RNA expression RNA structures RNA sequences Protein sequences Protein structures Protein motifs
DNA microarrays measure variations in RNA levels The full Yeast genome on a chip Red dots: genes whose RNA level increased Green dots: genes whose RNA level decreased De Risi et al, Science 278:680 http://cmgm.Stanford.EDU/pbrown/
Nylon Membrane Glass Slides GeneChip Substrates for High Throughput Arrays Single label P33 Single label biotin streptavidin Dual label Cy3, Cy5
* * * * * GeneChip® Probe Arrays Hybridized Probe Cell GeneChipProbe Array Single stranded, labeled RNA target Oligonucleotide probe 24µm Millions of copies of a specific oligonucleotide probe 1.28cm >200,000 different complementary probes Image of Hybridized Probe Array
5´ 3´ Multiple oligo probes GeneChip® Expression Array Design Gene Sequence Probes designed to be Perfect Match Probes designed to be Mismatch
Procedures for Target Preparation Cells Labeled transcript AAAA IVT (Biotin-UTP Biotin-CTP) L L L L Poly (A)+/ Total RNA cDNA Fragment (heat, Mg2+) L L Wash & Stain Hybridize (16 hours) L L Scan Labeled fragments
NSF Soybean Functional Genomics Steve Clough / Vodkin Lab Printing Arrays on 50 slides
Cells from condition A Cells from condition B mRNA Label Dye 1 Label Dye 2 cDNA Mix NSF / U of Illinois Microarray Workshop -Steve Clough / Vodkin Lab equal over under Ratio of expression of genes from two sources Total or
NSF Soybean Functional Genomics Steve Clough / Vodkin Lab GSI Lumonics
Cattle and Soy Controls Beta Actin PKG HPRT Beta 2 microglobulin Rubisco AB binding protein Major latex protein homologue (MSG) Array of cattle and soy spiking controls. 50 ug of cattle brain total RNA was labeled with Cy3 (green). 1 ul each of in vitro transcribed soy Rubisco (5 ng), AB binding protein (0.5 ng) and MSG (0.05 ng) were labeled with Cy5. The two labeled samples were cohybridized on superamine slides (Telechem, Inc.). To the right of each set of spots are five negative controls (water).
Fetal Spleen-Cy3 Adult Spleen-Cy5 IgM IgM MYLK MYLK IgM heavy chain IgM heavy chain COL1A2 COL1A2
GenePix Image Analysis Software Placenta vs. Brain – 3800 Cattle Placenta Array cy3cy5
Microarray Data Process • Experimental Design • Image Analysis – raw data • Normalization – “clean” data • Data Filtering – informative data • Model building • Data Mining (clustering, pattern recognition, et al) • Validation
Scatterplot of Normalized Data Fetal Adult
<-0.3 >0.3
Complexity Levels of Microarray Experiments: • Compare genes in a control situation versus a treatment situation • Example: Is the level of expression (up-regulated or down-regulated) significantly different in the two situations? (drug design application) • Methods: t-test, Bayesian approach • Find multiple genes that share common functionalities • Example: Find related genes that are dependent? • Methods: Clustering (hierarchical, k-means, self-organizing maps, neural network, support vector machines) • Infer the underlying gene and protein networks that are responsible for the patterns and functional pathways observed • Example: What is the gene regulation at system level? • Directions: mining regulatory regions, modeling regulatory networks on a global scale
Clustering to extract genes which tightly co-express. Statistical filters used: The genes present (Presence Call in Affymetrix) in drug treated, ANOVA p<0.02 between groups. Red indicates increased expression, and green is decreased expression (Log(fold change)). Genesight 3 (Biodiscovery Software, www.biodiscovery.com) NO DRUG 1nM Drug 1 mM Drug
Statistical filters used: The genes present (Presence Call in Affymetrix) in absence of drug, ANOVA p<0.02 between groups. NO DRUG 1nM Drug 1 mM Drug
Gene Expression Profile of Aging and Its Retardation by Caloric Restriction Cheol-Koo Lee, Roger G. Klopp, Richard Weindruch, Tomas A. Prolla
Data Mining Methods Classification, Regression (Predictive Modeling) Clustering (Segmentation) Association Discovery (Summarization) Change and deviation detection Dependency Modeling Information Visualization