140 likes | 164 Views
This case study explores the use of an automated annotation pipeline to identify and characterize genes involved in gene deletion disorders such as Williams-Beuren Syndrome and Graves Disease. The pipeline uses bioinformatics tools to analyze microarray data and extract information about genes and their protein products, enabling faster and more reliable annotation.
E N D
eScience Case Studies Using Taverna Dr. Georgina Moulton The University of Manchester (georgina.moulton@manchester.ac.uk) (on behalf of the myGRID team)
Traditional Bioinformatics 12181 acatttctac caacagtgga tgaggttgtt ggtctatgtt ctcaccaaat ttggtgttgt 12241 cagtctttta aattttaacc tttagagaag agtcatacag tcaatagcct tttttagctt 12301 gaccatccta atagatacac agtggtgtct cactgtgatt ttaatttgca ttttcctgct 12361 gactaattat gttgagcttg ttaccattta gacaacttca ttagagaagt gtctaatatt 12421 taggtgactt gcctgttttt ttttaattgg gatcttaatt tttttaaatt attgatttgt 12481 aggagctatt tatatattct ggatacaagt tctttatcag atacacagtt tgtgactatt 12541 ttcttataag tctgtggttt ttatattaat gtttttattg atgactgttt tttacaattg 12601 tggttaagta tacatgacat aaaacggatt atcttaacca ttttaaaatg taaaattcga 12661 tggcattaag tacatccaca atattgtgca actatcacca ctatcatact ccaaaagggc 12721 atccaatacc cattaagctg tcactcccca atctcccatt ttcccacccc tgacaatcaa 12781 taacccattt tctgtctcta tggatttgcc tgttctggat attcatatta atagaatcaa
Requirements • Automation • Reliability • Repeatability • Few programming skill required • Works on distributed resources
Multi-disciplinary • ~37000 downloads • Ranked 210 on sourceforge • Users in US, Singapore, UK, Europe, Australia, • Systems biology • Proteomics • Gene/protein annotation • Microarray data analysis • Medical image analysis • Heart simulations • High throughput screening • Phenotypical studies • Plants, Mouse, Human • Astronomy • Aerospace • Dilbert Cartoons
Williams-Beuren Syndrome (WBS) • Contiguous sporadic gene deletion disorder • 1/20,000 live births, caused by unequal crossover (homologous recombination) during meiosis • Haploinsufficiency of the region results in the phenotype • Multisystem phenotype – muscular, nervous, circulatory systems • Characteristic facial features • Unique cognitive profile • Mental retardation (IQ 40-100, mean~60, ‘normal’ mean ~ 100 ) • Outgoing personality, friendly nature, ‘charming’
~1.5 Mb 7q11.23 Patient deletions * * WBS SVAS GTF2IRD2P Physical Map CTA-315H11 ‘Gap’ CTB-51J22 FKBP6T POM121 GTF2IP NOLR1 NCF1P PMS2L STAG3 Chr 7 ~155 Mb Block B Block A Block C Williams-Beuren Syndrome Microdeletion Eicher E, Clark R & She, X An Assessment of the Sequence Gaps: Unfinished Business in a Finished Human Genome. Nature Genetics Reviews (2004) 5:345-354 Hillier L et al. The DNA Sequence of Human Chromosome 7. Nature (2003) 424:157-164 A-cen B-cen C-cen C-mid B-mid A-mid B-tel A-tel C-tel WBSCR1/E1f4H WBSCR5/LAB GTF2IRD1 WBSCR21 WBSCR22 WBSCR18 WBSCR14 GTF2IRD2 POM121 NOLR1 BAZ1B BCL7B FKBP6 GTF2I CLDN3 CLDN4 CYLN2 STX1A LIMK1 NCF1 TBL2 RFC2 FZD9 ELN
Filling a genomic gap in silico • Two steps to filling the genomic gap: • Identify new, overlapping sequence of interest • Characterise the new sequence at nucleotide and amino acid level • Number of issues if we are to do it the traditional way: • Frequently repeated – info rapidly added to public databases • Time consuming and mundane • Don’t always get results • Huge amount of interrelated data is produced
The Williams Workflows A B C A: Identification of overlapping sequence B: Characterisation of nucleotide sequence C: Characterisation of protein sequence
WBSCR21 WBSCR27 WBSCR24 WBSCR18 WBSCR22 WBSCR28 STX1A CLDN3 CLDN4 RP11-148M21 RP11-731K22 RP11-622P13 314,004bp extension All nine known genes identified (40/45 exons identified) The Biological Results Four workflow cycles totalling ~ 10 hours The gap was correctly closed and all known features identified WBSCR14 ELN CTA-315H11 CTB-51J22
Case Study – Graves Disease • Autoimmune disease that causes hyperthyroidism • Antibodies to the thyrotropin receptor result in constitutive activation of the receptor and increased levels of thyroid hormone • Original myGrid Case Study Ref: Li P, Hayward K, Jennings C, Owen K, Oinn T, Stevens R, Pearce S and Wipat A (2004) Association of variations in NFKBIE with Graves? disease using classical and myGrid methodologies. UK e-Science All Hands Meeting 2004
Graves Disease The experiment: • Analysing microarray data to determine genes differentially-expressed in Graves Disease patients and healthy controls • Characterising these genes (and any proteins encoded by them) in an annotation pipeline • From affymetrix probeset identifier, extract information about genes encoded in this region. • For each gene, evidence is extracted from other data sources to potentially support it as a candidate for disease involvement
Annotation Pipeline Evidence includes: • SNPs in coding and non-coding regions • Protein products • Protein structure and functional features • Metabolic Pathways • Gene Ontology terms