E N D
1. GLOBALBIODIVERSITY
2. True bioinformatics …
3. What are GBIF’s data ? These data must be digitised in order to be shared and fully utilised
4. Biological Data Domain
5. Subdomains could contribute to each other… Model accessibility mechanism: Since GenBank was brought on line, editors of journals in which articles that utilize sequence data are published have required that authors deposit their sequences with GenBank and obtain the registration number before their papers will be accepted. This brings the raw data into the public domain, where it can be re-used.
“Member”: A species is a class of individuals that have genes. Therefore, the sequences of genes and their products are “members” of the class “species”. An ecosystems comprises populations of all the species that exist within it, therefore species are “members” of ecosystems.
Context: Members operate within the context of the class to which they belong.
Taxonomy: As used here, includes classification and nomenclature of species.
Registry: A software mechanism for caching metadata that enables rapid searching.Model accessibility mechanism: Since GenBank was brought on line, editors of journals in which articles that utilize sequence data are published have required that authors deposit their sequences with GenBank and obtain the registration number before their papers will be accepted. This brings the raw data into the public domain, where it can be re-used.
“Member”: A species is a class of individuals that have genes. Therefore, the sequences of genes and their products are “members” of the class “species”. An ecosystems comprises populations of all the species that exist within it, therefore species are “members” of ecosystems.
Context: Members operate within the context of the class to which they belong.
Taxonomy: As used here, includes classification and nomenclature of species.
Registry: A software mechanism for caching metadata that enables rapid searching.
6. What is GBIF ? GBIF is a megascience facility aimed at
Making the world’s primary species occurence data freely and universally available via the Internet,
sharing these scientific data for society, science and a sustainable future
7. Why was GBIF established ? Both biodiversity and biodiversity data are unevenly distributed around the world:
8. GBIF’s focus is on primary data Primary data, because they are difficult or too time-consuming to access, are at present not often used in natural resource policy or management decisions
9. To undertake biodiversity informatics activities that must be accomplished on a worldwide basis (electronic catalogue, information architecture, coordination)
To take on tasks not being attempted by other initiatives but which would be of benefit to those initiatives (e.g. CHM, GTI)
To make biodiversity databases interoperable among themselves and with molecular, genetic, ecological and other types of databases, thus increasing the value of all of them
Why was GBIF established ?
10. GBIF’s area of data responsibility… The return on the investments made in the other areas will be enhanced by the data and interoperability provided by GBIF.
11. GBIF contribution to interoperability Interoperability is often defined as “databases talking to each other” but that isn’t quite right. In an interoperable system, the same query can be made against databases that differ from one another in structure and content and the answer to the query retrieved in a single session.
In this example slide, the blue arrows disappear because this kind of work by a person is not often done (also to get them out of the way for the next animation).
In the GBIF query portion of the example, the first arrow and the last arrow remain in place because all that the user would experience is the interaction with the search engine. Everything else would be transparent, even though vast distances and many databases might be involved.
The GBIF query first taps ECAT to find any synonyms of the name of species X, and these are included in the query to the registry. The query then goes to the databases that have the requested kinds of data (GenBank and ecological). The retrieved data are then relayed to the search engine and the user.Interoperability is often defined as “databases talking to each other” but that isn’t quite right. In an interoperable system, the same query can be made against databases that differ from one another in structure and content and the answer to the query retrieved in a single session.
In this example slide, the blue arrows disappear because this kind of work by a person is not often done (also to get them out of the way for the next animation).
In the GBIF query portion of the example, the first arrow and the last arrow remain in place because all that the user would experience is the interaction with the search engine. Everything else would be transparent, even though vast distances and many databases might be involved.
The GBIF query first taps ECAT to find any synonyms of the name of species X, and these are included in the query to the registry. The query then goes to the databases that have the requested kinds of data (GenBank and ecological). The retrieved data are then relayed to the search engine and the user.
12. True bioinformatics...
13. Data selection and appraisal… Project driven
Economic importance
Conservation importance
Research interest
Philosophical approach of collection curators
Funding availability
Understanding of funding bodies
14. Data retention… Natural History Museums have been retaining these data in physical form for hundreds of years
GBIF philosophy is to leave the digital data in the care of the holders of the physical data
However, this means that the NHMs must be supported
15. Data retention problems… Data sets generated & maintained by individuals (who maintains?)
Data sets that are not collections-based (institution maintains?)
Data sets that lack metadata (who adds, ex post facto?)
Data migration, curation, cleansing, etc.