420 likes | 515 Views
Time line and procedures for datasets. BCBC Pre-retreat Workshop Tyson’s Corner, VA May 11, 2011. Topics to cover. Timeline for a dataset from contact to web site Policies to follow and documents to use Ten questions about your dataset Creating a MAGE-TAB document with us
E N D
Time line and procedures for datasets BCBC Pre-retreat Workshop Tyson’s Corner, VA May 11, 2011
Topics to cover Timeline for a dataset from contact to web site Policies to follow and documents to use Ten questions about your dataset Creating a MAGE-TAB document with us Seeing your dataset on the Beta Cell web site A tool you can use for MAGE-TAB: Annotare
Datasets to Contact us about • Your deliverables • Microarray experiments • High Throughput sequencing experiments (RNA-seq, ChIP-seq, FAIRE-seq, etc.) • RT-PCR screens • Other deliverables – we can discuss how to integrate • Other key datasets • From your lab but from different funding • From the literature
Steps to get a study into Beta Cell • Contact us. Let us know what is coming and when so we can schedule working with you. • Fill out the Ten Questions. When we get this from you, we can generate an initial spreadsheet (MAGE-TAB) for you to complete. • Fill out highlighted areas of the MAGE-TAB. We will go back and forth with you on details to get it right. • Send us your data. We will set up a FTP account for you. Send us the raw data (e.g., Affymetrix CEL files, FASTQ sequence reads) and the processed data that the conclusions are based upon. • Set a release schedule. We will load the dataset and incorporate into queries and web pages as appropriate. We need to set when to release to the BCBC and to the general public. • We can also submit your data to ArrayExpress or, if desired, GEO. • View/Query your dataset. Beta Cell has releases every 3 to 4 months.
Timeline • Completion of MAGE-TAB: • Requires back and forth between the CC and the contact person in the investigator’s lab • Time to completion depends on responsiveness of such a contact person • Until the MAGE-TAB is completed, data loading cannot occur • Data loading: • Once the MAGE-TAB is completed and all necessary files have been delivered, time to load the data depends on the size of your study • For a typical study data loading takes a few weeks • Missing files will delay the process • Keep in mind that when you contact us to submit a study, you will be put in a queue and the process of getting your study into Beta Cell Genomics will start once you reach the top of the queue • Studies that are meant to be viewable on the BCBC website (either by the general public or by BCBC investigators only) have priority over private studies, i.e. a study which is to be kept private will be placed lower in the queue
Policies to follow and documents to use • Ten Questions about your dataset • Available as a BCBC miscellaneous resource • http://www.betacell.org/resources/data/miscellaneous/ • Bioinformatics/Epigenomics Working group • RNA-seq and ChIP-seq recommendations • Includes checklists for data and information to provide • Mike Snyder will provide overview and discuss
Meeting Deliverables • For a study to be considered fully “delivered”, the following is required on the investigator’s part: • Provide answers to the initial 10 questions and all necessary data files • Respond to all inquiries needed to generate an accurate MAGE-TAB • Allow your study to be visible (at least by other BCBC investigators) on the Beta Cell website
Topics to cover Timeline for a dataset from contact to web site Policies to follow and documents to use Ten questions about your dataset Creating a MAGE-TAB document with us Seeing your dataset on the Beta Cell web site A tool you can use for MAGE-TAB: Annotare
MGED Standards What information is needed for a microarray experiment? MIAME: Minimal Information About a Microarray Experiment. Brazma et al., Nature Genetics 2001 How do you “code up” microarray data? MAGE-OM: MicroArray Gene Expression Object Model. Spellman et al., Genome Biology 2002 MAGE-TAB Rayner et al., BMC Bioinformatics 2006 What words do you use to describe a microarray experiment? MO: MGED Ontology. Whetzel et al. Bioinformatics 2006
labelled nucleic acid labelled nucleic acid labelled nucleic acid labelled nucleic acid labeled nucleic acid Gene expression data matrix normalization hybridisation hybridisation hybridisation hybridisation hybridization Array design RNA extract RNA extract RNA extract RNA extract RNA extract Microarray Sample Sample Sample Sample Sample genes array array array array Protocol Protocol Protocol Protocol Protocol Protocol Experiment integration MIAME in a nutshell (ala Alvis Brazma) Stoeckert et al. Drug Discovery Today TARGETS 2004
labelled nucleic acid labelled nucleic acid labelled nucleic acid labelled nucleic acid nucleic acid Gene expression data matrix normalization hybridisation hybridisation hybridisation hybridisation hybridisation Array design RNA extract RNA extract RNA extract RNA extract RNA extract Microarray Sample Sample Sample Sample Sample genes array array array array Protocol Protocol Protocol Protocol Protocol Protocol Experiment integration Sequencing is replacing array technology @HWI-EAS266_0011:8:1:6:969#0/1 GTTTGCCNGTGTGTACGCTACCCCCTTCTTGTGTGTGTGTGTCT +HWI-EAS266_0011:8:1:6:969#0/1 _abb`a[DZ`aabaa_a`b]___^^aa_`aa_a^a[\\aZTZVY @HWI-EAS266_0011:8:1:7:1688#0/1 AAGATGANGGCAGGGTGCAAGATGGCAGGATGCAAGATGGCAGG +HWI-EAS266_0011:8:1:7:1688#0/1 a`^ab`^D\a]a`b``b_bbbaabb^abaa``^a_^_aa\]_VR @HWI-EAS266_0011:8:1:7:593#0/1 CAGTTCANTTCTCAGCACCACACTGGGATGCTCACACATGCCTG +HWI-EAS266_0011:8:1:7:593#0/1 abbbb_VD[bbbba_`bbbbbbbbbbbaa_`bbaabaabb_aa_ @HWI-EAS266_0011:8:1:7:139#0/1 CATGGGGNATAATTGCAATCCCCGATCCCCATCACGAATGGGGT +HWI-EAS266_0011:8:1:7:139#0/1 aab`[^YDY]Z\baa`aabaaaa`aa`a]aa```\aY]^\]ZVX @HWI-EAS266_0011:8:1:7:1390#0/1 GAATAATNGAATAGGACCGCGGTTCTATTTTGTTGGTTTTCGGA +HWI-EAS266_0011:8:1:7:1390#0/1 _U^b_`]D\__a_a`S```Y[a__]a\aa_`]`aTVZ__\HYVX @HWI-EAS266_0011:8:1:7:1663#0/1 TGATGTTNGTGGCAATAATGGGGGTAGCGGCAATGGTGGCGGGG +HWI-EAS266_0011:8:1:7:1663#0/1 a`[_X]\DQTZ[^YYa[[aXV[PZUUYSYBBBBBBBBBBBBBBB
ChiP-Seq MeDIP-Seq Etc. labelled nucleic acid labelled nucleic acid labelled nucleic acid labelled nucleic acid nucleic acid normalization hybridisation hybridisation hybridisation hybridisation hybridisation Array design RNA extract RNA extract RNA extract RNA extract Chromatin, DNA extract Microarray Sample Sample Sample Sample Sample genes array array array array Protocol Protocol Protocol Protocol Protocol Protocol Experiment integration Sequencing is replacing array technology @HWI-EAS266_0011:8:1:6:969#0/1 GTTTGCCNGTGTGTACGCTACCCCCTTCTTGTGTGTGTGTGTCT +HWI-EAS266_0011:8:1:6:969#0/1 _abb`a[DZ`aabaa_a`b]___^^aa_`aa_a^a[\\aZTZVY @HWI-EAS266_0011:8:1:7:1688#0/1 AAGATGANGGCAGGGTGCAAGATGGCAGGATGCAAGATGGCAGG +HWI-EAS266_0011:8:1:7:1688#0/1 a`^ab`^D\a]a`b``b_bbbaabb^abaa``^a_^_aa\]_VR @HWI-EAS266_0011:8:1:7:593#0/1 CAGTTCANTTCTCAGCACCACACTGGGATGCTCACACATGCCTG +HWI-EAS266_0011:8:1:7:593#0/1 abbbb_VD[bbbba_`bbbbbbbbbbbaa_`bbaabaabb_aa_ @HWI-EAS266_0011:8:1:7:139#0/1 CATGGGGNATAATTGCAATCCCCGATCCCCATCACGAATGGGGT +HWI-EAS266_0011:8:1:7:139#0/1 aab`[^YDY]Z\baa`aabaaaa`aa`a]aa```\aY]^\]ZVX @HWI-EAS266_0011:8:1:7:1390#0/1 GAATAATNGAATAGGACCGCGGTTCTATTTTGTTGGTTTTCGGA +HWI-EAS266_0011:8:1:7:1390#0/1 _U^b_`]D\__a_a`S```Y[a__]a\aa_`]`aTVZ__\HYVX @HWI-EAS266_0011:8:1:7:1663#0/1 TGATGTTNGTGGCAATAATGGGGGTAGCGGCAATGGTGGCGGGG +HWI-EAS266_0011:8:1:7:1663#0/1 a`[_X]\DQTZ[^YYa[[aXV[PZUUYSYBBBBBBBBBBBBBBB
From MGED to FGED What information is needed for an HTS experiment? MINSEQE: Minimum Information about a high-throughput SeQuencing Experiment How do you “code up” functional genomics data? MAGE-TAB can still be utlized What words do you use to describe a functional genomics experiment? OBI: Ontology for Biomedical Investigations, incorporates MO
MAGE-TAB Format What is MAGE-TAB? • A simple spreadsheet view consisting of 2 files: • IDF: describing the experiment design, contact details, variables, and protocols • SDRF: a spreadsheet with columns that describe samples, annotations, protocol references, assays, and data • Linked data files (e.g. CEL files) are referenced by the SDRF Where can I get MAGE-TAB from? • ~10,000 MAGE-TAB files are available from ArrayExpress (includes GEO derived and ArrayExpress data) • caArray also provides MAGE-TAB files for download Who is using MAGE-TAB? • BioConductor • GenePattern • MeV • and Beta Cell Genomics!
IDF file for E-TABM-34 IDF = Investigation Description Format
SDRF file for E-TABM-34 SDRF = Sample and Data Relationship Format
IDF A microarray expression study
OrganismPart black border = biomaterials red border = treatments Following 1 sample: bench component
in-silico component image acquisition feature extraction summarization (feature extraction II) and quantile normalization
SDRF Let’s focus on the highlighted row
Loading and Analyzing the Data • Image and .CEL files are archived and their location stored in the database • Raw and processed data loaded into the database • Downstream analyses (e.g. differential expression) are performed, generating gene lists • Analysis results loaded into the database
IDF A ChIP-Seq study
In-silico Component Ptf1a_s5_seq.txt s5_eland.txt Ptf1a_s5 Ptf1a_peaks Ptf1a_s4_seq.txt s4_eland.txt Ptf1a_s4 Input_s8_seq.txt s8_eland.txt Input_s8 Rbpjl_s6_seq.txt s6_eland.txt Rbpjl_s6 Input_s2_seq.txt s2_eland.txt Input_s2 Rbpjl_peaks Rbpjl_s4_seq.txt s4_eland.txt Rbpjl_s4 cluster generation image acquisition sequencing alignment peak calling
Topics to cover Time line for a dataset from contact to web site Policies to follow and documents to use Ten questions about your dataset Creating a MAGE-TAB document with us Seeing your dataset on the Beta Cell web site A tool you can use for MAGE-TAB: Annotare
Annotare - An open source standalone MAGE-TAB editor Shankar R, Parkinson H, Burdett T, Hastings E, Liu J, Miller M, Srinivasa R, White J, Brazma A, Sherlock G, Stoeckert CJ Jr, Ball CA. Annotare - a tool for annotating high-throughput biomedical investigations and resulting data. Bioinformatics. 2010 Aug 23.
Annotare - an open source MAGE-TAB Editor Annotare is an annotation tool for high throughput gene expression experiments in MAGE-TAB format. Researchers can describe their investigations with the investigators’ contact details, experimental design, protocols that were employed, references to publications, details of biological samples, arrays, and experimental data produced in the investigation.
Annotare Features • Intuitive graphical user interface forms for editing • Ontology support, an inbuilt ontology and web services connectivity to bioportal • Searchable standard templates • Design wizard • Validation module • Mac and Windows Support http://code.google.com/p/annotare/
Annotare Demo • File Gallery: Three different ways to get started • Looking at an existing MAGE-TAB • Form versus sheet view • Using a template • Using the wizard