1 / 38

Lab 9 The Cistrome

Lab 9 The Cistrome. March 28, 2012 Daniel Fernandez Alejandro Quiroz. 1 st ACT Information theory correction Motif Finding The Genome Browser Homework help Q1, Q2 INTERLUDE Electronic music with DJ Cistrome (10 min) 2 nd ACT Dah Cistrome MA2C Homework help Q3. Information Theory.

samson
Download Presentation

Lab 9 The Cistrome

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lab 9The Cistrome March 28, 2012 Daniel Fernandez Alejandro Quiroz

  2. 1st ACT Information theory correction Motif Finding The Genome Browser Homework help Q1, Q2 INTERLUDE Electronic music with DJ Cistrome (10 min) 2nd ACT Dah Cistrome MA2C Homework help Q3

  3. Information Theory

  4. Information Theory The amount of information transmitted through the channel is the same as the entropy (or uncertainty) associated with the source. I.e., it is maximized when the source can produce n possible outcomes, all with equal probability (1/n). Then, the entropy is log2(n). Thus, biologists took this concept and used it to characterize the amount of uncertainty associated with a motif, represented as a PWM. But, your TF got confused… see why!

  5. Information Theory channel destination Source ENTROPY INFORMATION A T C G 1 1 1 1 1 1 1 1 1

  6. Information Theory But what happens when we want to compare the uncertainty between two sources? Or the comparison between two probability distributions, i.e, the background sequence PWM and the motif PWM? RELATIVE ENTROPY, or, KULLBACK-LEIBLER DIVERGENCE, or INFORMATION CONTENT

  7. Motif Example IProkaryotic Co-expression Objective. Find the binding sites that control the gene regulation of co-expressed genes in Mycobacterium Tuberculosis. File. mt.fasta Note. We assume that genes are co- expressed because they are under the control of the same transcription factor(s), and we use Gibbs sampling to try to identify the putative binding motif for this factor(s).

  8. Motif Example IProkaryotic Co-expression Motif parameters are designed to capture the features of binding sites for a classic bacterial helix-turn-helix (HTH) type transcription factor. HTH-type TFs are typically symmetric homodimers, thus they bind to symmetric (palindromic) DNA binding sites. Furthermore, the two HTH regions of the dimeric TF typically contact bases in two adjacent major grooves of the DNA, and thus the two halves of the palindromic binding site span well over 10 bases (the approximate number of bases per helical turn of B-form DNA). The bases contacted by a TF are not necessarily contiguous, thus we use fragmentation to allow the Gibbs sampler to ignore positions which do not participate in the protein-DNA interaction, and are therefore not conserved as part of the binding site. To understand what I am saying: http://melolab.org/pdidb/web/content/home search 1lmb

  9. Motif Example IProkaryotic Co-expression http://ai.stanford.edu/~xsliu/BioProspector/ http://weblogo.berkeley.edu/logo.cgi

  10. DNA as Herederitary Material

  11. Central Dogma of Molecular Biology Gene Expression Splicing

  12. The Human Genome Project • The goal is to understand the human genome and its role in health and disease. • “The true payoff from the HGP will be the ability to better diagnose, treat and prevent disease” • Francis Collins. Director of the HGP and NHGRI

  13. Sequencing Assembly • The sequence existed as millions of clones of small fragments • Finding overlaps and putting together “contigs” was a huge challenge Annotation • What does it all mean? • Where are the genes? • What do they do? Thousands of researchers from 20 centers worked on the HGP

  14. UCSC Genome browser • http://genome.ucsc.edu/

  15. Basic Features Advanced Features • Coordinate conversion • Custom tracks • Table Browser Species, assemblies Genome browser Gene sorter Sequence search (BLAT)

  16. UCSC Genome Browser Consists of a suite of tools for the viewing and mining of genomic data.

  17. Organization of Genomic Data

  18. Genome Gatewaystart page, basic search

  19. Overview of the browser

  20. The browser

  21. The browser

  22. The browser

  23. The browser

  24. Genome Gatewaystart page, basic search Chromosome/region Gene Cytogenetic coordinates Phenotype of interest Key words: Zinc fingers, kinase Genome version Try the following example: Autism How many UCSC genes are located on chromosome X? How many RefSeq are associated with Autism? Pick the gene: AUTS2 (uc011keg.1) at chr7:70231248-70257884

  25. base position Gene annotation Tracks! Where we obtain information gene annotation

  26. UCSC Table Browser • Retrieve the data associated with a track in text format • To calculate intersections between tracks • To retrieve DNA sequence covered by a track.

  27. Hhelp Q2 How many RefSeq genes have more than 15 exons in human chromosome 1? How many genes on chromosome 22, on the positive strand, are associated with a disease on the OMIM db?

  28. The CistromeUnderstanding Genetic Regulation • CisTrOme, stands for Cis-acting regulatory elements searched across, Trans, the whole genOme. • Visit and register at http://cistrome.org/ • The objective is to map/identify the binding regions of a transcription factor across (trans) the genome in order to understand the regulatory mechanisms of gene expression in the chromosome where the gene is located (cis).

  29. Types of Data and Peak –Calling Methods MAT Model based Analysis for Tiling arrays MA2C Model based Analysis for 2-Color arrays MACS Model based Analysis for Chip-Seq • Chip-Chip data (Chip on Chip) • Affymetrix one color arrays • Nimble two color arrays • Chip-Seq data (Chip and NGS) • Sequencing data (Illumina, Roche, 454)

  30. MA2C – Hhelp Q3Model based Analysis for 2-Color arrays • http://liulab.dfci.harvard.edu/MA2C/MA2C.htm • Installation. You need Java Runtime Environment (JRE) 5.0 or higher. You can download it from http://java.sun.com • Download the MA2C.zip and uncompress it. • Windows: open MA2C\dist\MA2C.bat • Go to the terminal and then MA2C/dist/ and execute the command java –Xmx600m –jar MA2C.jar (or just double click on MA2C.jar)

  31. MA2CData Normalization Download the data from the homework – SDC3 zip file Uncompress it and open MA2C Upload the SampleKeyIVtoX.txt to the sample key Select your control group (IP channel) Go to normalization tab and normalize your data – default parameters are ok.

  32. MA2CPeak Finding Go to the peak-detection tab. Change the parameters accordingly Select find peaks Voila! the results have been ouputed to the MA2C_output folder!

More Related