420 likes | 501 Views
Towards a Data Model for the Australian Microbial Resources Information Network (AMRiN). Version: 0.03 17/09/2010. Lynette Woodburn Atlas of Living Australia. TIP. Each slide in this presentation comes with accompanying Notes.
E N D
Towards a Data Model for the Australian Microbial Resources Information Network(AMRiN) Version: 0.03 17/09/2010 Lynette Woodburn Atlas of Living Australia
TIP • Each slide in this presentation comes with accompanying Notes. • You can’t see them if you display this presentation in ‘Slide Show’ mode. • If you’d like to see the Notes • view the presentation in ‘Normal’ mode, and • expand the pane below the slide (the Notes pane) to see extra text. • Only then will you have a chance of understanding all the crazy diagrams.
Towards a data model for AMRiN Requirement a standard set of data fields for all micro-organisms . to support the sharing and integration of data through AMRiN . to pre-configure BioloMICS Options . choose an existing set . develop something new Recommendation . surprise!
Requirements • Options • Recommendation
AMRiN community AMRiN
AMRiN AMRiN community
AMRiN AMRiN community
Requirements • Options • Recommendation • - existing • CABRI • MCL
Common Access to Biological Resources and Information CABRI a European organization of partner collections who contribute data to searchable‘catalogues’ covering • bacteria & archaea • fungi & yeasts • animal & human cell lines • plant cell lines • hybridomas • phages • plasmids • plant cell viruses • genomic libraries http://www.cabri.org/
26 • 23 • 29 • 17 • 15 • 33 • 30 • 12 • 7 CABRI’s sets of data elements elements per set Isolated_from • bacteria & archaea • fungi & yeasts • animal cell lines • plant cell lines • hybridomas • phages • plasmids • plant cell viruses • genomic libraries Doubling_time Morphology Lysogenicity Original_host_plant
Common Access to Biological Resources and Information CABRI • For each different kind of biological resource, • CABRI defines nested sets of data elements Mandatory Recommended Full
CABRI : bacteria & archaea Mandatory Recommended Full Strain_numberOther_collection_numbersRestrictionsOrganism_typeNameInfrasubspecific_namesStatusHistoryConditions_for_growth Form_of_supply SerovarOther_namesIsolated_fromGeographic_originMutantGenotypeLiterature Sexual_statePathogenicityEnzyme_productionMetabolite_productionApplicationsCatalogue_entryRemarksPrice_codePlasmids
CABRI : fungi & yeasts Mandatory Recommended Full Strain_numberOther_collection_numbersNameStatusOrganism_typeHistoryRestrictionsForm_of_supplyConditions_for_growth Misapplied_namesRace Substrate Geographic_origin Literature Applications Mutant Sexual_state Price_code Remarks Pathogenicity Metabolite_production Enzyme_production Genotype
CABRI : animal & human cell lines Mandatory Recommended Full Accession_numberCell_line_nameBrief_descriptionDescriptionDepositorBibliographic_referencesMorphologyCulture_conditionsVirusesPropertiesRelease_conditionsHazard TumorigenicityKaryologyFreezing_mediumSterilityValidation_assaysFurther_bibliographyCommentsStorageDoubling_timeMycoplasmaFingerprintCytogeneticsKaryotypeCommentsResearch_council_depositBIOMED_1 Passage_numberSpecies_validation
CABRI’s sets of data elements • bacteria & archaea • fungi & yeasts • animal cell lines • plant cell lines • hybridomas • phages • plasmids • plant cell viruses • genomic libraries • 26 • 23 • 29 • 17 • 15 • 33 • 30 • 12 • 7 192
Sharing data about one kind of biological resource is easy eg. phages
Sharing data about one kind of biological resource is easy eg. plasmids
Sharing data about multiplekinds of biological resources is hard Other_culture_collection_numbers Other_collection_numbers
genomic libraries bacteria & archaea plant cell lines hybridomas fungi & yeasts plant cell viruses phages animal cell lines plasmids What is the prospect of deriving a common model from CABRI for describing several different kinds of biological resources ? 133 distinct data elements … … distributed across 9 sets
CABRI as a common model ? each of 92 elements is found in only one set only 41 elements are found in more than one set
CABRI as a common model ? 27 data elements are found in two sets 10 ….. in three 4 ….. in four No elements are found in more than 4 sets
bacteria & archaea • fungi & yeasts • animal cell lines • hybridomas • phages • plant cell lines • plant cell viruses • plasmids • genomic libraries Distribution of data elements across CABRI sets Count of data elements
CABRI data element ‘themes’ handling & distribution regulations Name / classification of item ID of item in collection care / maintenance characteristics item admin literature origin …. • bacteria & archaea • fungi & yeasts • animal cell lines • plant cell lines • hybridomas • phages • plasmids • plant cell viruses • genomic libraries
CABRI : comparison of elements across sets • different names, same meaning (definition) Morphology, Morphology_and_growth History, History_of_deposit Accession_number, Strain_number Bibliographic_references, Reference_paper, Literature, Reference, Further_bibliography Restricted_distribution, Release_conditions, Restrictions, Distribution ….
CABRI : comparison of elements across sets • same name, different meanings Brief_description Type
CABRI : comparison of data element sets • varying levels of scope
CABRI : fitness for our purpose • 9 sets of data elements (but does not cover algae) • good for sharing information about one kind of organism • few elements common to several sets • hard to share information about more than one kind of organism • does not lend itself to the derivation of a common set • elements of ‘different names, same meaning’ • elements of ‘same name, different meanings’ • elements with meanings of varying scope • has international acceptance / presence (but no longer funded?)
Requirements • Options • Recommendation • - existing • CABRI • MCL
Microbiological Common Language MCL • a new data exchange standard for microbiological information • Research in Microbiology, 161(6), 439-445 • http://www.straininfo.net/projects/mcl • a pluggable framework, easily extended • has the same ancestor as CABRI (MINE) • underpins StrainInfo (www.straininfo.net) • “ a world-wide, virtual catalog integrating the information from BRC [Biological Resource Centres] catalogs with related information”
CABRI compared with MCL CABRI MCL partitioned by kind of biological resource partitioned by workflow step
The abstract model of Microbiological Common Language (MCL) Strain Deposit Culture Isolation Sample Medium Publication … follows the logical flow from sampling to subsequent deposits
mcl : Sample Sample sampleDate sampleCollector sampleCollectorInstitute sampleCulture sampleCultureStrainNumber sampleDescription sampleLocationDescription sampleLocationCountry sampleLocationPlace sampleHabitat sampleHabitatEnvoTerm sampleAlt sampleLat sampleLong comments
publication nomenclaturalPublication environmentPublication historyPublication taxonomicPublication mcl : Culture Culture id oxygenRelationship history [growthTemperature] isolationDate minimalGrowthTemperature isolator optimalGrowthTemperature isolatorInstitute maximalGrowthTemperature isolationMethod hasSample recommendMedium speciesName typeStrainOf typeStrainOfSpecies typeStrainOfGenus strainNumber otherStrainNumber [otherStrainNumbers] cultureLastUpdateDate catalogURL comments
Sample publication nomenclaturalPublication environmentPublication historyPublication taxonomicPublication hasSample recommendMedium Medium Publication some Object Properties Culture
mcl : Medium mcl : Publication Medium mediumName Publication mediumNumber mediumURL mediumDescription dcterms: bibliographicCitation comments dc:title dc:creator prism:publicationName prism: volume prism:number prism:startingPage prism:pageRange dcterms:issued
MCL : fitness for our purpose • MCL offers a broadly-applicable suite of data elements • . data elements are grouped according to workflow steps, not organism type • . applicable to algae and cyanobacteria • . the Strain concept supports the logical linking of related cultures • the model is modular and easily extensible • . model cohesion is achieved through Object Properties • . links easily with genomic standards (see StrainInfo) • born and raised in Europe (StrainInfo), but now going global • . Asian biorepositories network is considering adoption • . we’re invited to contribute to ongoing development • primarily devised (custom-built) as a data exchange standard
Requirements • Options • Recommendation
Recommendation : dip a toe into the water • MCL, custom-built for describing microbiological data, deserves consideration • Proposal • undertake a pilot, involving a small group of AMRiN participants, • to assess the suitability of MCL for AMRiN’s purpose.
AMRiN community AMRiN
AMRiN participants’ input map local elements to MCL elements identify local elements to be kept ‘private’ identify other local elements to be shared ; provide English definitions to enable reconciliation with other participants’ elements Note: some MCL elements may not have a local equivalent
Pilot assessment • Coverage? How much orange overlaps purple? • What additional common elements exist amongst the set to be shared? How much purple overlaps purple? • Other assessment criteria?
Pulling the pieces together Please consider the foregoing proposal. Does it seem reasonable to you? Do you think there’s a better way?