90 likes | 218 Views
my Grid - putting the scientist at the centre. A case study investigating Williams-Beuren Syndrome.
E N D
myGrid -putting the scientist at the centre A case study investigating Williams-Beuren Syndrome
12181 acatttctac caacagtgga tgaggttgtt ggtctatgtt ctcaccaaat ttggtgttgt 12241 cagtctttta aattttaacc tttagagaag agtcatacag tcaatagcct tttttagctt 12301 gaccatccta atagatacac agtggtgtct cactgtgatt ttaatttgca ttttcctgct 12361 gactaattat gttgagcttg ttaccattta gacaacttca ttagagaagt gtctaatatt 12421 taggtgactt gcctgttttt ttttaattgg gatcttaatt tttttaaatt attgatttgt 12481 aggagctatt tatatattct ggatacaagt tctttatcag atacacagtt tgtgactatt 12541 ttcttataag tctgtggttt ttatattaat gtttttattg atgactgttt tttacaattg 12601 tggttaagta tacatgacat aaaacggatt atcttaacca ttttaaaatg taaaattcga 12661 tggcattaag tacatccaca atattgtgca actatcacca ctatcatact ccaaaagggc 12721 atccaatacc cattaagctg tcactcccca atctcccatt ttcccacccc tgacaatcaa 12781 taacccattt tctgtctcta tggatttgcc tgttctggat attcatatta atagaatcaa The scientist’s (Hannah’s) problem Physical Map CTA-315H11 ‘Gap’ CTB-51J22 ~1.5 Mb • Identify new, overlapping sequence of interest • Characterise the new sequence at nucleotide and amino acid level 7q11.23 Cutting and pasting between numerous web-based services i.e. BLAST, InterProScan etc Chr 7 ~155 Mb
The Williams Workflows A B C A: Identification of overlapping sequence B: Characterisation of nucleotide sequence C: Characterisation of protein sequence
..masked_sequence_of .. nucleotide_sequence project ..part_of organisation >gi|19747251|gb|AC005089.3| Homo sapiens BAC clone CTA-315H11 from 7, complete sequence AAGCTTTTCTGGCACTGTTTCCTTCTTCCTGATAACCAGAGAAGGAAAAGATCTCCATTTTACAGATGAG GAAACAGGCTCAGAGAGGTCAAGGCTCTGGCTCAAGGTCACACAGCCTGGGAACGGCAAAGCTGATATTC AAACCCAAGCATCTTGGCTCCAAAGCCCTGGTTTCTGTTCCCACTACTGTCAGTGACCTTGGCAAGCCCT GTCCTCCTCCGGGCTTCACTCTGCACACCTGTAACCTGGGGTTAAATGGGCTCACCTGGACTGTTGAGCG experiment definition rdf:type ..part_of group urn:lsid:taverna:datathing:13 ..part_of ..author workflow definition ..works_for ..invocation_of ..author person ..BLAST_Report workflow invocation ..similar_sequences_to ..run_for ..run_during service description rdf:type 19747251 AC005089.3 831 Homo sapiens BAC clone CTA-315H11 from 7, complete sequence 15145617 AC073846.6 815 Homo sapiens BAC clone RP11-622P13 from 7, complete sequence 15384807 AL365366.20 46.1 Human DNA sequence from clone RP11-553N16 on chromosome 1, complete sequence 7717376 AL163282.2 44.1 Homo sapiens chromosome 21 segment HS21C082 16304790 AL133523.5 44.1 Human chromosome 14 DNA sequence BAC R-775G15 of library RPCI-11 from chromosome 14 of Homo sapiens (Human), complete sequence 34367431 BX648272.1 44.1 Homo sapiens mRNA; cDNA DKFZp686G08119 (from clone DKFZp686G08119) 5629923 AC007298.17 44.1 Homo sapiens 12q22 BAC RPCI11-256L6 (Roswell Park Cancer Institute Human BAC Library) complete sequence 34533695 AK126986.1 44.1 Homo sapiens cDNA FLJ45040 fis, clone BRAWH3020486 20377057 AC069363.10 44.1 Homo sapiens chromosome 17, clone RP11-104J23, complete sequence 4191263 AL031674.1 44.1 Human DNA sequence from clone RP4-715N11 on chromosome 20q13.1-13.2 Contains two putative novel genes, ESTs, STSs and GSSs, complete sequence 17977487 AC093690.5 44.1 Homo sapiens BAC clone RP11-731I19 from 2, complete sequence 17048246 AC012568.7 44.1 Homo sapiens chromosome 15, clone RP11-342M21, complete sequence 14485328 AL355339.7 44.1 Human DNA sequence from clone RP11-461K13 on chromosome 10, complete sequence 5757554 AC007074.2 44.1 Homo sapiens PAC clone RP3-368G6 from X, complete sequence 4176355 AC005509.1 44.1 Homo sapiens chromosome 4 clone B200N5 map 4q25, complete sequence 2829108 AF042090.1 44.1 Homo sapiens chromosome 21q22.3 PAC 171F15, complete sequence urn:lsid:taverna:datathing:15 service invocation ..described_by ..created_by ..filtered_version_of The myGrid Information Model Annotation & argumentation
Using workflows and web services • Automation • Capturing processes in an explicit manner • Tedium! Computers don’t get bored/distracted/hungry/impatient! • Saves repeated time and effort • Modification, maintenance, substitution and personalisation • Easy to share, explain, relocate, reuse and build • Available to wider audience: don’t need to be a coder, just need to know how to do Bioinformatics • Releases Scientists/Bioinformaticians to do other work • Record • Provenance: what the data is like, where it came from, its quality • Management of data (LSID - Life Science IDentifiers)
Demonstration topics • Taverna – using a workflow editing environment to capture bioinformatics protocols • Personalisation – setting context to allow later personalisation • Provenance – retaining information on the origin of results
The myGrid Information Model Programmes, studies & experiments