1 / 27

Count data analysis in the small RNA sequencing

Count data analysis in the small RNA sequencing. Cai Tao NIBS. Count data in high-throughput sequencing. Reads counts is most straightforward information in the deep sequencing technology such as small RNA, methlytion and DGE etc. Count and relative frequency. Count and relative frequency.

winter
Download Presentation

Count data analysis in the small RNA sequencing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Count data analysis in the small RNA sequencing Cai Tao NIBS

  2. Count data in high-throughput sequencing Reads counts is most straightforward information in the deep sequencing technology such as small RNA, methlytion and DGE etc

  3. Count and relative frequency

  4. Count and relative frequency

  5. Correspondence analysis • Describing and interpreting data • Relative frequencies • Chi-square distance • Weighted for the mass • Distance for Clustering

  6. Entropy and Mutual information • Entropy • H= -Sum( Pi *log(Pi)) • Mutual information • I(x,y) = H(x) + H(y) – H(x,y)

  7. Statistical comparison • Goodness of fitness (chisq.test) • Fisher exact test • fisher.test • Likelihood ratio test • 1-pchisq(log(likelihood) , df)

  8. Statistical comparison Expected of Y is the variance of Y is Expected of Y is the variance of Y is

  9. Small RNA counts

  10. Questions raised based NB distribution • How to properly estimate the variance of counts? • using local regression • How to statistically compare the counts data A and B? • The p-value of a pair of observed count sums (A; B) is then the sum of all probabilities less or equal to p(A; B), given that the overall sum is A+B • DESeq package for the calculation

  11. Comparison with other NB method

  12. Regression • Poisson regression • log(Y) ~ covariate + treatment + log(offset) • LRT test • Negative binomial regression • glmFit in edgeR package • LRT test

  13. The modern RNA world Costa FF, Gene 357, 83

  14. Small RNA sequencing method

  15. Arabidopsis endogenous small RNAs “TRENDS in Plant Science” Vol.11 No.9

  16. miRNA definition in Plant • Precise excision from the stem of a stem-loop precursor • miRNA/miRNA* structure • Good match with certain limited mismatch • Deep sequencing will be great helpful Plant Cell. 2008, 20(12):3186-90

  17. miRbaseannotation

  18. miRNA analysis pipeline • Removing the adapters • Produce unique tag • Mapping to miRBase hairpin and do some revision of miRbase sequencing • Count miRNAs • Visualization • Comparison

  19. Pre-installed software • Emboss • Vienna RNA Package • Bowtie • R package (entropy, ca) • Bioconductor package (EdgeR, baySeq, DESeq)

  20. Data pre-procession • Find the fastq file, and converted to fasta file • perl fastq2fasta.pl 01.txt > 01.raw.fasta • Using vectorstrip from EMBOSS package to remove the adapter • vectorstrip -sequence 01.raw.fasta Yes -vectorsfile vector -mismatch 20 -besthits -outfile 01.vectorstrip -outseq 01.fasta • Collapse the file to unique small RNA • perl sRNA.pl 01.fasta > 01.sRNA • perl checkuniqseq.pl 01.sRNA > 01.sRNA.uniq

  21. Mapping • Bowtie mapped to the whole genome and extract the perfect matched small RNA • bowtie a_thaliana -f 01.sRNA.uniq 01.uniq.map -a -v 0 -p 4 --al 01.uniq.match --un 01.uniq.unmatch • Bowtie mapped to the miRbase hairpins • bowtie miRbase -f 01.uniq.match 01.uniq.miRbase.map -a -v 0 -p 4 --al 01.uniq.miRbase.match --un 01.uniq.miRbase.unmatch • Check the mapping in the miRbase hairpins (Vienna RNA Package needed) • perl check_hairpin.pl miRbase.fna2 01.uniq.miRbase.map miRbase.pos

  22. Count the miRNA • Provide the revised miRNA table with position • miRbase_revised.pos • Count for each of the miRNAs • perl count_miRNA.pl miRbase_revised.pos 01.uniq.miRbase.map • Provide the RPM counts • Count*106/library size

  23. Visualization • Find the miRNA counts tables here • CA_demo.xlsx • Visualization demo (workshop.R) • Analysis the tissue specific miRNAs • Do the CA analysis • Do the clusters

  24. Statistical comparison • Comparing the samples from embryos and leavesusing edgeR, DESeq, BaySeq, regression (workshop2.R)

More Related