Cufflinks

Cufflinks Matt Paisner, Hua He, Steve Smith and Brian Lovett

The Vision • RNAseq can be used for transcript discovery and abundance estimation • What’s missing: algorithms which aren’t restricted by prior gene annotations (which are often incomplete) and account for alternative transcription and splicing. • Hence, Cufflinks.

The Need • Evidence of ambiguous assignment of isoforms. TSS site/promoter changes and splice site changes were found previously by the authors • Longer reads and pair end reads do not do enough

The Biology • General assumption of randomization of reads • Central Dogma • Transcription Start Site (TSS) • Splice site • Isoform

Central Dogma and Regulation

Splicing • Thisblahblahblahblahblahblahisblahblahimportant • Thisblahblahblahblahblahblahisblahblahimportant • “This” “is” “important” - Exons • “blah” - Introns (Intrusions)

Major Change 1

Major Change 2

Why it matters: Isoforms • Not only different sizes, but different shapes • Shape determines function • Isoforms would map to the same section of the genome: undetected without Cufflinks • Separating transcripts into isoforms elucidates a more realistic representation of what is happening

TopHat Mapping short reads Trapnellet. al,Bioinformatics, 2009

TopHat • No genome reference annotations are needed • The output of TopHat is the input of Cufflinks. • Input: Reads and genome • Output: Read mappings • Short reads present computational challenges • BOWTIE

How does TopHat Work?! Big Idea: “Exon Inference”!!

Step 1: Initial Mapping via Bowtie • Group 1: Mapped Reads (Segments) • Group 2: Initially Unmapped (IUM)Reads • possibly intron-spanning read • Based on Group 1, we want to get intron-spanning reads from Group 2 Reference Mapped Reads

Step 2: Generate Putative Exons

Step 3: Look for Potential Splice Signals Putative Exons

Step 4: Seed-and-Extend

Cufflinks Isoform/Transcript Detection and Quantification Trapnellet al, Nature Biotech, 2010

Step 5: Identify Compatible Reads Two reads are compatible if their overlap contains the exact same implied introns (or none). If two reads are not compatible they are incompatible.

Step 6: Less BIOLOGY, and NOW it is the time for some GRAPH THEORIES……. “We emphasize that the definition of a transcription locus is not biological……” - Authors

Step 6: Create Overlap Graph Connect compatible reads in order Create a DAG

A path in this graph corresponds to a transcript isoform

Theory • Solving minimum pathcover (isoforms) in the overlap graph implies the fewest transcriptsnecessary to explain the reads. • Solve minimumpath cover by ﬁnding largest set of individual reads such that no two are compatible. • According to Dilworth Thereom, find a maximum matching in a bipartitegraph

Step 8: Convert a DAG into a Bipartite Graph

Step 9: Looking for Maximum Matching inside a bipartite graph via Bipartite Matching Algorithm BIPARTITE-MATCHING Algorithm: Add augmenting path via BFS, repeatedly adding the paths into the matching until none can be added.

A path in this graph corresponds to a transcript isoform

Projective normalization underestimates expression isoform a isoform b project all isoforms into genome coordinates • R reads total, r reads for the gene: • ra for isoform a • rb for isoform b so but

How should expression levels be estimated? • A-B are distinguished by the presence of splice junction (a) or (b). • A-C are distinguished by the presence of splice junction (a) and change in UTR • B-C are distinguished by the presence of splice junction (b) and change in UTR (a) (b)

How should expression levels be estimated? • Longer transcripts contain more reads. • Reads that could have originated from multiple transcripts are informative. • Relative abundance estimation requires “discriminatory reads”. (a) (b)

A model for RNA-Seq • r = Transcript proportions for assignment of reads to transcripts • L = Likelihood of this assignment • R = all reads

A model for RNA-Seq • T = All transcripts

A model for RNA-Seq • Define: • Expected possible positions for an arbitrary fragment in Transcript t • F(i) = pr(random fragment has length i) • l(t) = Full length of transcript t

A model for RNA-Seq

A model for RNA-Seq • It (r) = Implied length of r’s fragment if r is assigned to transcript t • Recall: F(i) = pr(fragment length = i)

Projective normalization underestimates expression isoform a isoform b project all isoforms into genome coordinates • R reads total, r reads for the gene: • ra for isoform a • rb for isoform b so but

A model for RNA-Seq • Now we have a maximum likelihood function in terms of r, the distribution of reads among transcripts. • Non-negative linear model

Inference with the sequencing model • Maximum likelihood function is concave - optimization using the EM algorithm. • Asymptotic MLE theory leads to a covariance matrix for the estimator in the form of the inverse of the observed Fisher information matrix • Importance sampling from the posterior distribution used for estimating the abundances from the posterior expectation, and 95% confidence intervals for the estimates. • This approach extends the log linear model of H. Jiang and W. Wong, Bioinformatics 2009 to a linear model for paired end reads. • For more background see Li et al., Bioinformatics, 2010 and Bullard et al., BMC Bioinformatics, 2010.

Utility of Cufflinks • mRNA as proxy for gene expression & action • Control points • transcriptional vs • post transcriptional • Does isoform-level discovery & quantification matter? • Apparently, yes • Putatively discovered about 12K new isoforms while recovering about 13K known • Plus other stuff…

The skeletal myogenesistranscriptomeRNA-Seq (2x75bp GAIIx) along time course of mouse C2C12 differentiation 168 hours 120 hours -24 hours 60 hours myotube myoctyte • 123,575,666reads • 84,369,078 reads • 140,384,062reads • 82,138,212reads • 66,541,668alignments • 103,681,081alignments • 47,431,271alignments • 89,162,512alignments • 10,754,363to junctions • 19,194,697to junctions • 9,015,806to junctions • 17,449,848to junctions • 58,008transfrags • 69,716transfrags • 55,241transfrags • 63,664transfrags fusion differentiation (starting at 0 hours) Illustration based on: Ohtake et al, J. Cell Sci., 2006; 119:3822-3832 Slide courtesy of Hector CorradaBravo

Validation

Projective normalization underestimates expression isoform a isoform b project all isoforms into genome coordinates • R reads total, r reads for the gene: • ra for isoform a • rb for isoform b so but Slide courtesy of Hector CorradaBravo

Discovery is necessary for accurate abundance estimates Slide courtesy of Hector CorradaBravo

Some Questions… • Do isoforms of a given gene have interesting temporal patterns? • Increasing, decreasing, more complex… • What does this mean biologically? • What about transcriptional versus post transcriptional regulation? • Differential transcription • Differential splicing

Dynamics of Myc expression Slide courtesy of Hector CorradaBravo

Overloading Metric using Jensen-Shannon Divergence Average Entropy Entropy of Average Metric: One-sided t-test under the null hypothesis that there is no difference in abundance; Type I errors controlled with Benjamini-Hotchbergcorrection (FDR)

Regulatory Overloading # Genes (FDR < 0.05) Differential splicing Fibronectin Tropomyosin 1 Mef2d … 231 17 Fhl3 Fhl1 Myl1 … 101 Differential TSS preference

Dynamics of Myc expression d( , ) Slide courtesy of Hector CorradaBravo

New TSS = New Points of Regulation Whatwould a “collapsed” RNA-seq alignment look like? Microarray? TSS=Transcription Start Site

Cufflinks

Cufflinks

Presentation Transcript

Cufflinks Market 2019 Forecasts for Global

Swarovski Crystals Designer Cufflinks for mens

Shop Online Stainless Steel Cufflinks Jewelry

Latest Wedding Cufflinks for Men - Frenchcuffed.com

Buy Cufflinks Online | Satya Paul

Personalised Photo Engraved Cufflinks

Men's Cufflinks - An Elegant History

Buy Unique Cufflinks

Professional Cufflinks for Mens

Cufflinks online shopping

Menâ€™s Cufflinks

Menâ€™s Cufflinks Online

Cycling Cufflinks

Custom Made Personalised Cufflinks

Silk Tie on sale | Silk Knot Cufflinks

Cycling cufflinks

Buy Branded Cufflinks Online | William Penn

Cycling cufflinks

Designer Cufflinks

Dog Breed Cufflinks

Wimbledon Cufflinks Can Now Produce Personalised Cufflinks in Any Design

Custom Cufflinks