1 / 82

LPHIG Bioinformatics of SFS Genomics Center Program Projects

LPHIG Bioinformatics of SFS Genomics Center Program Projects. Project leader: Chun-Yuan Huang 1 Members : Charles Joseph Murphy 1 , Aurash Mohaimani 2 PIs: Peter J. Tonellato 1,2,3 , Rebecca Klaper 4 1. Zilber School of Public Health, University of Wisconsin at Milwaukee, Milwaukee, WI

shauna
Download Presentation

LPHIG Bioinformatics of SFS Genomics Center Program Projects

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. LPHIG Bioinformatics of SFS Genomics Center Program Projects Project leader: Chun-Yuan Huang1 Members: Charles Joseph Murphy1, AurashMohaimani2 PIs: Peter J. Tonellato1,2,3, Rebecca Klaper4 1. Zilber School of Public Health, University of Wisconsin at Milwaukee, Milwaukee, WI 2. Medical Informatics Program, University of Wisconsin at Milwaukee, Milwaukee, WI 3. Center for Biomedical Informatics, Harvard Medical School, Boston, MA 4. Great Lakes Genomics Center, School of Freshwater Sciences, University of Wisconsin, Milwaukee, WI

  2. SFS Genomics Center Program Projects • Project 1: Biomarkers of Reproduction Staging of Sturgeon (Acipenser fulvescens) • Project 2: Daphnia Magna Gene Expression under Nanomaterials Exposure

  3. Background for conservation of sturgeon population • Sturgeon appeared in the fossil record 200 million years ago, and have undergone remarkably little morphological change, indicating their evolution has been exceptionally slow and earning them informal status as living fossils. • Sturgeon become prized lately for its meat, eggs (caviar) and oil. • Sturgeon was exceptionally vulnerable to overfishing, as restoration of its populations is complicated by its slow reproductive cycle. Sturgeon exhibits delayed sexual maturity (between 10 and 30 years of age), infrequent spawning (every few years), and sexual monomorphism.

  4. Background for sturgeon sex determination • Searches for sex-specific markers using DNA-based techniques such as RAPD and AFLP had been failed [1]. • Two sturgeon sex determining genes, dmrt1 (human homolog: doublesexand Mab-3 related transcription factor 1) and tra-1, were identified using next-generation 454 sequencing and de novo assembly of gonad transcriptomes [2]. • Sturgeons undergoing male differentiation express high levels (by qPCR analysis) of Sertoli cell factors (dmrt1, sox9) and of genes involved in the production and receptivity of androgens (cyp17a1, star and ar) together with lh [3]. [1] SaeedKeyvanshokooh and Ahmad Gharaei. A review of sex determination and searches for sex-specific markers in sturgeon. Aquaculture Research, 2010 Aug;41(9):e1–e7. [2] Hale MC, Jackson JR, Dewoody JA. Discovery and evaluation of candidate sex-determining genes and xenobiotics in the gonads of lake sturgeon (Acipenser fulvescens). Genetica. 2010 Jul;138(7):745-56. [3] Berbejillo J, et al. Expression and phylogeny of candidate genes for sex differentiation in a primitive fish species, the Siberian sturgeon, Acipenser baerii. MolReprod Dev. 2012 Aug;79(8):504-16.

  5. Background for sturgeon reproduction stage • Determination of reproduction stages would help to detect gonadal maturity for sturgeon reproduction and population conservation. • Reproduction stages are defined by DNR [1] as follows: • Stage 1 • Stage 2 • Stage 3 • Stage 4 [1] N.A.

  6. Biomarkers of Reproduction Staging of Sturgeon Primary goals: • Use RNA-seq to compare multiple sexual stages and determine uniquely expressed genes among each stage which could potentially be used as a biomarker for reproduction stage determination. • Use the above RNA-seq annotated gene information to identify proteins in our proteomics data obtained from sturgeon blood of various stages. • Use the above RNA-seq annotated gene information to examine the evolutionary questions regarding how the genome of sturgeon is relates to other more recently evolved fish species.

  7. Sturgeon • Gene expression biomarkers for sexual stages RNA-Seq • Annotate proteomics data from blood samples • Phylogenic analysis

  8. Preliminary Considerations • No reference genome is available for lake sturgeon. • Ongoing and finished fish genomes: • Pufferfish (Tetraodonnigroviridis) • Fugu (Japanese Pufferfish) • Stickleback (Gasterosteusaculeatus) • Coelocanth(Indonesia), Coelocanth (South African) • Tilapia (family Cichlidae) Genome Project that includes Nile Tilapia (Oreochromisniloticus), Astatotilapiaburtoni, Pundamilianyererei, Malawi zebra, Neolamprologusbrichardi • Zebrafish • Salmon • Catfish • Medakaricefish • Lamprey (Lampetrafluviatilis) • Dogfish (Scyliorhinuscanicula) • Southern platyfish (Xiphophorusmaculatus) • Poeciliidfish (Xiphophorusmaculatus) • Spotted gar (Lepisosteusoculatus)

  9. Preliminary Considerations (cont.) • Sturgeon is one of the oldest families of bony fish (ray-finned fish, class of Actinopterigii) in existence, and is quite distant from other fishes that have been sequenced. • Based on Near [1], Sturgeon seems closer to Gar than to the other fishes. • A Spotted Gar’s transcriptome (by RNA-Seq) is constructed by Amores [2] and available from DDBJ [3], while its draft genome assembly is available from Broad Institute [4]. • ,zebrafish [1] Near et al. Resolution of ray-finned fish phylogeny and timing of diversification. ProcNatlAcadSci U S A. 2012 Aug 21;109(34):13698-703. [2] Amores et al, Genome evolution and meiotic maps by massively parallel DNA sequencing: spotted gar, an outgroup for the teleost genome duplication. Genetics. 2011 Aug;188(4):799-808. [3] https://trace.ddbj.nig.ac.jp/DRASearch/submission?acc=SRA026509 [4] ftp://ftp.broadinstitute.org/pub/assemblies/fish/spottedGar/

  10. Slide adapted from “Leveraging Trinity for de novo transcriptome assembly and analysis, 2012 CSHL workshop, Brian Haas, Broad Institute

  11. Overview of the de novo transcriptome assembly strategy Martin JA, Wang Z. Next-generation transcriptome assembly. Nat Rev Genet. 2011 Sep 7;12(10):671-82.

  12. Survey on de novo transcriptome assembly methods Martin JA, Wang Z. Next-generation transcriptome assembly. Nat Rev Genet. 2011 Sep 7;12(10):671-82.

  13. Sample Description • Sturgeon liver samples from three biological replicates of female reproduction stage 1, 2, and male reproduction stages stage 1, 2. • 100-bp paired-end reads from Illumina HiSeq2500. • 915,602,572 total reads (457,801,286 paired reads) • The Purdue Genomics Center has done the de novo Transcriptome assembly using the Trinity method. • The reads have been further mapped to contigs (coordinate) via Bowtie, sorted by coordinate and resulted in BAM-format files for each sample. • Also contigs in each sample are counted for all mapped reads, along with the contig’slength (bp), the homolog search result of each contig blasted against NCBI, the homolog’s GO terms and GenbankIDs.http://www.genomics.purdue.edu/%7Ecore/projects/Klaper/

  14. Sample Description Note: F1L: female stage 1, F2: female stage 2, M1: male stage 1, M2: male stage 2. Samples are all prepared from sturgeon liver (L).

  15. Trinity De novo reconstruction of transcriptomes Overview (Purdue Genome Center)

  16. Example of Data FastQC • Data is in high quality. Unfiltered reads:

  17. Slide adapted from “Leveraging Trinity for de novo transcriptome assembly and analysis, 2012 CSHL workshop, Brian Haas, Broad Institute

  18. Analysis Plan A (the fast-track version) • Use contigs, counts and homolog (blast result of the contigs) available from the Purdue Genomics Center • Differential expression and biomarker discovery • Estimate contigs abundance using bowtie (done in Purdue U.) • Statistical analysis for significantly differential expressed (DE) contigs using EdgeR: • Identify DE contigs/biomarkers among reproduction stages (F1 vs. F2; M1 vs. M2) • Identify DE contigs/biomarkers among gender (F1 vs. M1; F2 vs. M2) • Functional annotation by Trinotate (a module in the Trinity package). • Pathway analysis by Ingenuity Pathway Analysis (IPA). • GO term enrichment analysis by the Database for Annotation, Visualization and Integrated Discovery (DAVID) • Gene Set Enrichment Analysis (GSEA) • Identify proteins in the proteomics data obtained from sturgeon blood of various stages. • Comparative genomic study – evolutionary aspect of Sturgeon genome as relates to other fish species

  19. Sturgeon F1 vs. F2 DE contigs (isoforms) analysis by bowtie-edgeR Red: FDR<0.05, total 309 contigs

  20. Sturgeon M1 vs. M2 DE contigs (isoforms) analysis by bowtie-edgeR Red: FDR<0.05, total 229 contigs

  21. Sturgeon F1 vs. M1 DE contigs (isoforms) analysis by bowtie-edgeR Red: FDR<0.05, total 616 contigs

  22. Sturgeon F2 vs. M2 DE contigs (isoforms) analysis by bowtie-edgeR Red: FDR<0.05, total 1543 contigs

  23. Analysis Plan A (the fast-track version) • Use contigs, counts and homolog (blast result of the contigs) available from the Purdue Genomics Center • Differential expression and biomarker discovery • Estimate contigs abundance using RSEM • Statistical analysis for significantly differential expressed (DE) contigs using EdgeR: • Identify DE contigs/biomarkers among reproduction stages (F1 vs. F2; M1 vs. M2) • Identify DE contigs/biomarkers among gender (F1 vs. M1; F2 vs. M2) • Functional annotation by Trinotate (a module in the Trinity package). • Pathway analysis by Ingenuity Pathway Analysis (IPA). • GO term enrichment analysis by the Database for Annotation, Visualization and Integrated Discovery (DAVID) • Gene Set Enrichment Analysis (GSEA) • Identify proteins in the proteomics data obtained from sturgeon blood of various stages. • Comparative genomic study – evolutionary aspect of Sturgeon genome as relates to other fish species

  24. Sturgeon F1 vs. F2 DE contigs (isoforms) analysis by RSEM-edgeR Red: FDR<0.05, total 27 contigs

  25. Sturgeon F1 vs. F2: 27 DE contigs (isoforms) sorted by logFoldChange (total 27 contigs with FDR < 0.05)

  26. Sturgeon M1 vs. M2 DE contigs (isoforms) analysis by RSEM-edgeR Red: FDR<0.05, total 4 contigs

  27. Sturgeon M1 vs. M2: 4 DE contigs (isoforms) (total 4 contigs with FDR < 0.05)

  28. Sturgeon F1 vs. M1 DE contigs (isoforms) analysis by RSEM-edgeR Red: FDR<0.05, total 0 contigs

  29. Sturgeon F2 vs. M2 DE contigs (isoforms) analysis by RSEM-edgeR Red: FDR<0.05, total 70 contigs

  30. Sturgeon F2 vs. M2: 70 DE contigs (isoforms) sorted by logFoldChange (total 70 contigs with FDR < 0.05)

  31. Sturgeon F2 vs. M2: 70 DE contigs (isoforms) sorted by logFoldChange (total 70 contigs with FDR < 0.05) (continue ..)

  32. Sturgeon F1 vs. F2 DE components (genes) analysis by RSEM-edgeR Red: FDR<0.05, total 5 components

  33. Sturgeon F1 vs. F2: 5 DE components (genes) (total 5 components with FDR < 0.05)

  34. Sturgeon M1 vs. M2 DE components (genes) analysis by RSEM-edgeR Red: FDR<0.05, total 7 components

  35. Sturgeon M1 vs. M2: 7 DE components (genes) (total 7 components with FDR < 0.05)

  36. Sturgeon F1 vs. M1 DE components (genes) analysis by RSEM-edgeR Red: FDR<0.05, total 10 components

  37. Sturgeon F1 vs. M1: 10 DE components (genes) (total 10 components with FDR < 0.05)

  38. Sturgeon F2 vs. M2 DE components (genes) analysis by RSEM-edgeR Red: FDR<0.05, total 49 components

  39. Sturgeon F2 vs. M2: 49 DE components (genes) sorted by logFoldChange (total 49 components with FDR < 0.05)

  40. Sturgeon F2 vs. M2: 49 DE components (genes) sorted by logFoldChange (total 49 components with FDR < 0.05) (continue ..)

  41. Analysis Plan A (the fast-track version) • Use contigs, counts and homolog (blast result of the contigs) available from the Purdue Genomics Center • Differential expression and biomarker discovery • Estimate contigs abundance using RSEM • Statistical analysis for significantly differential expressed (DE) contigs using EdgeR: • Identify DE contigs/biomarkers among reproduction stages (F1 vs. F2; M1 vs. M2) • Identify DE contigs/biomarkers among gender (F1 vs. M1; F2 vs. M2) • Functional annotation by Trinotate (a module in the Trinity package). • Pathway analysis by Ingenuity Pathway Analysis (IPA). • GO term enrichment analysis by the Database for Annotation, Visualization and Integrated Discovery (DAVID) • Gene Set Enrichment Analysis (GSEA) • Identify proteins in the proteomics data obtained from sturgeon blood of various stages. • Comparative genomic study – evolutionary aspect of Sturgeon genome as relates to other fish species

  42. Functional annotation by Trinotate • Trinotateis a comprehensive annotation suite designed for automatic functional annotation of de novo Transcriptome assemblies created using the Trinity assembly program. • Trinotate makes use of a number of different well referenced methods for functional annotation including • Search/generate the most likely longest-ORF peptide candidates from the contigs of the Trinity Assembly (Transdecoder) • Homology search to known sequence data (NCBI-BLASTP), • Protein domain identification (HMMER/PFAM), • Protein signal prediction (singalP/tmHMM), and • Comparison to currently currated annotation databases (EMBL UniproteggNOG/GO Pathways databases).

  43. Functional annotation by Trinotate • Trinity de novo assembled isoforms (446,408) are subject to Trinotate analysis. • More than one peptide could be resulted from blastp of one contig. • Trinotate annotation of 446,408 contigs become 478,700 peptide records.

  44. Example of two records for one contig (comp221487_c2_seq3)

  45. Validation of contigcomp221487_c2_seq3 blastp result

  46. Validation of contigcomp221487_c2_seq3 blastp result

  47. Validation of contig comp219629_c1_seq1 blastpresult

  48. Validation of contig comp219629_c1_seq1 blastpresult

  49. Analysis Plan A (the fast-track version) • Use contigs, counts and homolog (blast result of the contigs) available from the Purdue Genomics Center • Differential expression and biomarker discovery • Estimate contigs abundance using RSEM • Statistical analysis for significantly differential expressed (DE) contigs using EdgeR: • Identify DE contigs/biomarkers among reproduction stages (F1 vs. F2; M1 vs. M2) • Identify DE contigs/biomarkers among gender (F1 vs. M1; F2 vs. M2) • Functional annotation by Trinotate (a module in the Trinity package). • Pathway analysis by Ingenuity Pathway Analysis (IPA). • GO term enrichment analysis by the Database for Annotation, Visualization and Integrated Discovery (DAVID) • Gene Set Enrichment Analysis (GSEA) • Identify proteins in the proteomics data obtained from sturgeon blood of various stages. • Comparative genomic study – evolutionary aspect of Sturgeon genome as relates to other fish species

  50. Pathway analysis by Ingenuity Pathway Analysis (IPA)

More Related