100 likes | 113 Views
Explore gene expression differences in C. elegans life stages using advanced bioinformatics methods. Analyze transcript reads datasets and interpret differential gene expression. Validate results for biological relevance.
E N D
BIF-30806 Group Project Group (A)rabidopsis: David Nieuwenhuijse Matthew Price Qianqian ZhangThijsSlijkhuis
Species: Caenorhabditiselegans • Nematode worm • Genome of ~100M bp(completed 2002) • ~20,000 genes
Project choice: Advanced Project • Investigation of differences in gene expression over multiple conditions
Datasets to use • We will use four different conditions, corresponding to four different life-stages of the organism (L2, L3, L4 & YA) • For each life-stage, there are 2-3 datasets (runs) of transcript reads, available on the NCBI SRA online database. • Reference Genome also required
Dataset preparation • .sra files are first converted to .fastq files via fastq-dump • .fastqrun-files are merged together to create a single .fastq file per stadia, via command-line script (cat) • Reference genome selected from Ensembl database, after a Ref. genome from Wormbase failed to work
Pipeline Overview Transcript reads .fastqfile TopHat program Readssplice-alignedto genome Reference genome .gtf file CuffLinks program Reconstructed transcriptome Complete Transcriptome quantified(4 files) CuffMerge program Merged transcriptome file CuffDiff program Differential gene expression
Data Validation • Run the pipeline on another closely-related organism for comparable results? • Do the biological explanations of the gene expression make sense in light of the conditional contexts?