600 likes | 716 Views
BAO SW engineering considerations. Outline. Overview Users Basic Usecases Approaches. BAO phase 1. want to build software for the BAO - to make it available to the world generally need to clarify design objectives users and usecases discuss alternative approaches and implications
E N D
Outline • Overview • Users • Basic Usecases • Approaches
BAO phase 1 • want to build software for the BAO - to make it available to the world generally • need to clarify design objectives • users and usecases • discuss alternative approaches and implications • discuss some plans
End-users • query: search BAO using text and/or SPARQL • browse: search BAO interactively using some kind of visual aid (e.g., treeview) • visualize: explore the BAO graphically (as a graph) • download: download BAO in various formats • share: provide machine accessible interfaces for query and download
End-users • no modification of data • various ways of exploring and downloading data • assumes pre-existence of BAO
Admin-users • c/e/r: create and maintain the BAO • validate: run reasoners, etc. to ensure that new version of the BAO are valid • register: add new data sources that can be used with the BAO • map: associate data (from registered source) with the BAO • upload: add new data for use with the BAO
Admin-users • create, modify BAO • maintain BAO versions • associate data from various sources with BAO • this seems to me to be the tricky part
End-user Access • (easy part)
End-users • query: search BAO using text and/or SPARQL • browse: search BAO interactively using some kind of visual aid (e.g., treeview) • visualize: explore the BAO graphically (as a graph) • download: download BAO in various formats • share: provide machine accessible interfaces for query and download
Some Conclusions • End-user usecases are distinct from administrative user usecases • Design considerations regarding these classes of users can be separated • Building of end-user and administrative user components can be done independently • Need to understand Admin-user roles
End-user access • web-based • browse, query, visualize (possibly) • SOAP • for machines • Other apps (if we want) • cytoscape - visualization • Joseki - query interface
Admin-user Access • (hard part)
Admin-users • c/e/r: create and maintain the BAO • validate: run reasoners, etc. to ensure that new version of the BAO are valid • register: add new data sources that can be used with the BAO • map: associate data (from registered source) with the BAO • upload: add new data for use with the BAO
Mapping/Populating • All data to be used with the BAO resides in other systems and has various representations • Initial objective is to be able to search PubChem assays using BAO
Approaches • BAO is an ontology for representing bioassay data - Alignment • data sources will be made semantically compatible with BAO and assimilated • BAO is an ontology for annotating bioassays - Annotation • BAO exists independently from data in sources and is linked using single URI to identify source record
Alignment • Implied this approach in proposal • Create BAO and BAO vocabulary • Make semantic model of source data (e.g., PubChem) • Align that model with the BAO using things like rdfs:equivelentClass and possibly coding (e.g., using Vine and other tools) • Data will then be assimilated/transformed to BAO
Annotation • Create BAO and BAO vocabulary • Partition BAO (logically) into controlled/curated and user provided partitions • Annotate assays (i.e., URIs) • May require tool development to speed annotation process • Need processes and tools to maintain BAO vocabulary (true to some extent as well for alignment option)
Alignment vs Annotation • Alignment • BAO is primarily semantic model • BAO used to represent assay data • BAO content fairly flexible • transformation of data in source systems • Annotation • BAO is reference model and vocabulary • BAO semantic content is semi-static • source data not transformed
Approach 1: Alignment • Build BAO • Build source level ontologies for mapping • Build/integrate tools to support alignment • Align source ontologies with BAO (equivelentClass, etc.) • Deploy BAO • Load BAO with instances from sources
Alignment Usecase • align two semantic models • need two models • if source does not have model will need to make one • need to make source data available through the new model
Annotation Usecase • reference a recorded assay (e.g., PubChem) • provide some required data (e.g., description) • select some data from pre-populated BAO (e.g., detection method) • save the new instance (user provided + BAO controlled) in the BAO knowlegebase
Approach 2: Annotation • Build BAO • Partition BAO (logically) into “source specified” and “controlled” • Enumerate controlled partition (e.g., provide values for “detection method”) • Build tools to help select values from controlled partition • Build tools to facilitate population of “specified” partition
Various advantages • Ease of maintenance, from a curation pov • Maintains independence of BAO ontology from the application of BAO • Allows distribution of enumerated BAO as separate useful thing
Alignment: P&C • Seems like proposed plan • Documents transformations • High maintenance • Somewhat complex development • BAO, by itself, is not necessarily distributable as tool, only as export
Annotation: P&C • easier maintenance • simpler system architecture • distributable BAO (explicitly identifies BAO as independent deliverable) • can expand to cover alignment option (option 1) as well • seems like what would be most useful (BAO as tool) • only reference to source data is through URI (single point)
Path • Draft initial BAO • Partition BAO • Enumerate controlled partition • Build application ontology, align, code • Develop tools to speed annotation (e.g., text crunch descriptions to give suggestions of controlled BAO elements) • Annotate PubChem using all of the above
Ontology Development • assume approach 2 (annotation) • adopt approach 2 methodology (draft, partition, enumeration) • establish tools to support methodology
Project Deliverables • BAO end-user application • browse, query, visualize (V1) • endpoint specific functionality (V2) • structure specific functionality (V3) • BAO admin-user application • source registration, assay annotation (V1) • bulk assay annotation (V2) • endpoint upload (V2/V3) • BAO ontology (packaged and versioned) • BAO annotation tools (maybe) • entity extraction from text using full BAO • others? • BAO end-user application populated with PubChem data
Non-deliverables(but essential) • BAO maintenance/curation tools (protege, etc.)
Structure • Four separate dependent projects • end-user application • admin-user application • BAO development and curation • Annotation of PubChem using all of the above
General • Need names for deliverables (e.g., baq, baa, bao, bat) • Need to identify and assemble teams for each project
General Approach • Assemble design team • Mockup UI in Caretta, prototype • Code-level design • schema, OWL, Java • Build • Test
General Approach • Basically same approach as in BAQ • Assemble design team • Mockup UI in Caretta, prototype • Code-level design • schema, OWL, Java • Build • Test