1 / 30

CS491JH: Data Mining in Bioinformatics Introduction to Microarray Technology Technology Background Data Processing Proce

CS491JH: Data Mining in Bioinformatics Introduction to Microarray Technology Technology Background Data Processing Procedure Characteristics of Data Data integration and Data mining. Nylon Membrane. Glass Slides. GeneChip. Substrates for High Throughput Arrays. Single label P 33.

swaantje
Download Presentation

CS491JH: Data Mining in Bioinformatics Introduction to Microarray Technology Technology Background Data Processing Proce

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS491JH: Data Mining in Bioinformatics • Introduction to Microarray Technology • Technology Background • Data Processing Procedure • Characteristics of Data • Data integration and Data mining

  2. Nylon Membrane Glass Slides GeneChip Substrates for High Throughput Arrays Single label P33 Single label biotin streptavidin Dual label Cy3, Cy5

  3. * * * * * GeneChip® Probe Arrays Hybridized Probe Cell GeneChipProbe Array Single stranded, labeled RNA target Oligonucleotide probe 24µm Millions of copies of a specific oligonucleotide probe 1.28cm >200,000 different complementary probes Image of Hybridized Probe Array

  4. 3´ Multiple oligo probes GeneChip® Expression Array Design Gene Sequence Probes designed to be Perfect Match Probes designed to be Mismatch

  5. Procedures for Target Preparation Cells Labeled transcript AAAA IVT (Biotin-UTP Biotin-CTP) L L L L Poly (A)+/ Total RNA cDNA Fragment (heat, Mg2+) L L Wash & Stain Hybridize (16 hours) L L Scan Labeled fragments

  6. Microarray Technology

  7. NSF Soybean Functional Genomics Steve Clough / Vodkin Lab Printing Arrays on 50 slides

  8. Cells from condition A Cells from condition B mRNA Label Dye 1 Label Dye 2 cDNA Mix NSF / U of Illinois Microarray Workshop -Steve Clough / Vodkin Lab equal over under Ratio of expression of genes from two sources Total or

  9. NSF Soybean Functional Genomics Steve Clough / Vodkin Lab GSI Lumonics

  10. Cattle and Soy Controls Beta Actin PKG HPRT Beta 2 microglobulin Rubisco AB binding protein Major latex protein homologue (MSG) Array of cattle and soy spiking controls. 50 ug of cattle brain total RNA was labeled with Cy3 (green). 1 ul each of in vitro transcribed soy Rubisco (5 ng), AB binding protein (0.5 ng) and MSG (0.05 ng) were labeled with Cy5. The two labeled samples were cohybridized on superamine slides (Telechem, Inc.). To the right of each set of spots are five negative controls (water).

  11. Fetal Spleen-Cy3 Adult Spleen-Cy5 IgM IgM MYLK MYLK IgM heavy chain IgM heavy chain COL1A2 COL1A2

  12. GenePix Image Analysis Software Placenta vs. Brain – 3800 Cattle Placenta Array cy3cy5

  13. Microarray Data Process • Experimental Design • Image Analysis – raw data • Normalization – “clean” data • Data Filtering – informative data • Model building • Data Mining (clustering, pattern recognition, et al) • Validation

  14. Scatterplot of Normalized Data Fetal Adult

  15. <-0.3 >0.3

  16. Characteristics of Data Data can be viewed as a NxM matrix (N >> M): N is the number of genes M is the number of data points for each gene Or Nx(M+K) K is the number of Features describing each gene(genome location, functional description, metabolic pathway et al)

  17. Model for Data Analysis • Gene Expression is a Dynamic Process • Each Microarray Experiment is a snap shot of the process • Need basic biological knowledge to build model • For Example: • Assumption – In most of experiments, only a small set of genes (100s/1000s) have been affected significantly.

  18. Data Mining Need for Data Mining • Data volumes are too large for traditional analysis methods • Large number of records and high dimensional data • Only small portion of data is analyzed • Decision support process becomes more complex Functions of Data Mining Use the data to build predictors – prediction, classification, deviation detection, segmentation Generates more sophisticated summaries and reports to aid understanding of the data – find clusters, partitions in data

  19. Data Mining Methods Classification, Regression (Predictive Modeling) Clustering (Segmentation) Association Discovery (Summarization) Change and deviation detection Dependency Modeling Information Visualization

  20. Clustered display of data from time course of serum stimulation of primary human fibroblasts. Cholesterol Biosynthesis Cell Cycle Immediate Early Response Signaling and Angiogenesis Wound Healing and Tissue Remodeling Eisen et al. Proc. Natl. Acad. Sci. USA 95 (1998) pg 14865

  21. Self Organizing Maps

  22. Molecular Classification of Cancer

  23. Gene Expression Profile of Aging and Its Retardation by Caloric Restriction Cheol-Koo Lee, Roger G. Klopp, Richard Weindruch, Tomas A. Prolla

  24. Expression Landscape of cell-cycle regulated genes in yeast

  25. Multi-dimension data visualization

More Related