90 likes | 215 Views
Public data - available for projects. 6 data sets: Human Tissues Leukemia Spike-in FARO compendium Y east Cell Cycle Yeast Rosetta Find one yourself !?. Human Tissue Atlas. 79 human tissues (in duplicates) Among the tissues are: brain samples heart and liver samples fetal samples
E N D
Public data - available for projects • 6 data sets: • Human Tissues • Leukemia • Spike-in • FARO compendium • Yeast Cell Cycle • Yeast Rosetta • Find one yourself !?
Human Tissue Atlas • 79 human tissues (in duplicates) • Among the tissues are: • brain samples • heart and liver samples • fetal samples • 22215 probe sets (HGU133A subset) • Preprocessed data: • normalized • expression index calculated • Used for investigation of global trends and chromosomal organization of transcription, evaluation of gene prediction • Su et al., 2004 http://symatlas.gnf.org/SymAtlas/
Leukemia Data • 43 T-cell ALL arrays with a subset of 35 patients treated with two different treatments ending up in: 1. Relapses: 9 in total • 6 -Heme relapse • 1 -2nd AML • 2 -Other relapses 2. Recovered: 26 • Platform: Affymetrix HGU133A • Dataset has previously been used for classification problems • For more information: Yeoh et al, Cancer Cell, 2002 http://www.stjuderesearch.org/data/ALL1/
Spike In Dataset • Subset of the SPIKE-IN HGU95 Latin square data • "Normal"sample + spike-in of transcripts that hybridize to 14 probe sets (The concentrations of the spike-in is known) • 2 series of concentrations: Each probe set is spiked in, in two different concentrations (pM). 12 replicates for each series - four replicates on three GeneChip batches (24 GeneChip CEL files are available in total) Previous usage: a benchmark data set for preprocessing methods
FARO compendium Gene expression signatures from 242 experimental factors • From 1700 Affymetrix ATH1 GeneChips Gene expression signature: List of significantly differentially expressed genes Factors addressed: all public data (ATH1, Arabidopsis, 2007) Mutants, stress, chemical, hormonal, tissue. Etc. Organism: Arabidopsis thaliana Data has been processed, available data is: Ranked gene lists for 242 factors Fold changes
Yeast Cell Cycle Data • The experiment: • Three time-series, where samples were taken from a synchronized yeast cell culture as it progresses through the cell cycle. • Three different synchronization methods to arrest the cell cycle: Two temperature sensitive mutant strains (Cdc15 and Cdc28) that cannot pass the cell cycle at high temperature • Rapid removal of mating factor alpha from the culture, which releases it from arrest. • Aim of the original studies: • to determine the genes that fluctuate in expression during the cell cycle • to characterize when in the cell cycle these genes are expressed and repressed. • The data set: • three separate files, normalized and preprocessed data.
Yeast Rosetta Compendium • Dataset consisting of a compendium of expression profiles: • 276 deletion mutants (69 of which where unknown at the time) • 11 tetracycline-regulatable essential genes • 13 compound treatments • Data: • P-values and logratios • generated by comparison with 63 control experiments. • Data originally used for identifying gene clusters and profiling of unknown ORFs and drug targets. • For more information: Hughes et al., Cell, 2000
Data Set Overview FARO 242 experiments Rank Fold Change
Practical stuff Where: http://www.cbs.dtu.dk/chipcourse/Data.sets/ When: Thursday January 10th Data set discussion 1 (Human tissue, Spike-in, Cell cycle) Monday January 14th Data set discussion 2 (Leukemia, FARO compendium, Rosetta compendium) Wednesday January 16th (15:00) Deadline for problem formulation hand-in 1/2 page