200 likes | 362 Views
Towards Mammalian Feeding Database. Vladimir Gapeyev NESCent. 1st Working Group Meeting Analysis and Synthesis of Physiologic Data from the Mammalian Feeding Apparatus NESCent, Durham, NC 26 February 2009. Goal of this talk. Start soliciting application requirements. Plan of the talk.
E N D
Towards Mammalian Feeding Database Vladimir Gapeyev NESCent 1st Working Group Meeting Analysis and Synthesis of Physiologic Data from the Mammalian Feeding Apparatus NESCent, Durham, NC 26 February 2009
Goal of this talk Start soliciting application requirements
Plan of the talk • Web-based databases • NESCent Informatics • Mammalian Feeding • Data model • Application mock-up
Dryad data repository http://www.datadryad.org -- Ryan Scherle
Notable points about Bio databases • Data vs. informational display • Accession IDs • Correspond to “units of contribution” • Enable linking between DBs • Layers of data • What are “annotations” in one DB, are “primary data” in another
NESCent Informatics • Science support mission: • IT infrastructure • Custom prototype applications • Directly-funded projects • People
Sample prototypes from NESCent • about 1 full-time developer for 2-3 months • split among Vladimir Gapeyev Xianhua Liu Hilmar Lapp
Prototype development for a WG • What to expect: • Scope: an app facilitating WG scientific activity • Quality: a functional and working prototype • Deliverable: open-source code • Support: near-term hosting and administration • What not to expect: • industrial-strength 24x7 system • data entry or transformation en masse • scientific data curation • on-going development after WG wraps up • … But: NESCent runs dedicated projects, too
Two major areas for IT support • Managing the data • collecting, storing, searching, downloading • “out” is the same as “in”: no added value • Primary data: parameters and digital recordings • Derived data: e.g., chewing cycle annotations • Facilitating analysis • rectify and integrate before download • extract and assemble relevant data subsets • transform into stats-ready formats
Managing the data • Data model: structure of the data • Sample data records • Entity-relationship diagrams • Controlled vocabularies • User interface: manipulation of the data • Mock-ups of pages / forms • Usage sequences and scenarios
Some concrete open issues • Do Animals belong to Sessions, Studies, or Labs? • Does File belong to a Behavior or to a Session? • What are the kinds of recordings to be collected? • Are there simultaneous recordings? • What are recording file formats? On the wiki: https://www.nescent.org/wg_feeding/Data_model_sketch -- add to and comment!
Another DM aspect: file structure • Refer to data points by # or by second? • Bother about tapes info or assume digitized? • What’s besides EMG?
Mammalian feeding: UI mock-up http://feeding-dev.nescent.org/
How we can use the UI mock-up http://feeding-dev.nescent.org/ • Try to populate with data • Try to do searches • Take note of what is awkward and what is lacking • Get ideas to refine the data model • Determine usage scenarios that require a more complex interface • Correct Lab names, etc. to ones we prefer
Facilitating analysis • Rectify and integrate before download • Use one of a dozen pre-determined algorithms • Supply your own algorithm • Extract and assemble data subsets • Only get channels for anterior temporalis • Get each chew into a separate file • Do these for all apes in the database • Transform into stats-ready formats • Tab- or comma- delimited? • Special headers added? • Accompanied by a phylogenetic tree?
Informatics’ goals for this meeting • Understand high-level needs of this WG for data storage and processing • Clarify ambiguities and fill in details of the data model • Document major usage scenarios • Establish a stakeholder group for day-to-day collaboration • Determine constraints on the timeline