The Paleobiology Database

The Paleobiology Database A Hands-on Tutorial on Estimating Fossil Diversity Patterns Wolfgang Kiessling, 25 September 2012

Program • 09:00 – 09:20 Computer-Hookup, Intro • 09:20 – 10:00 Background and Rationale • 10:00 – 10:45 Basic Features • 10:45 – 11:00 Break • 11:00 – 11:30 Advanced Features • 11:30 – 12:30 Diversity Through Time • 12:30 – 13:30 Lunch • 13:30 – 14:00 Sampling-Standardized Diversity Curves with the PBDB • 14:00 – 15:00 Data Entry Trial

Important Resources • Course Materials • http://download.naturkundemuseum-berlin.de/wolfgang.kiessling/Workshop • Database Servers • http://paleodb.org • http://paleodb.geology.wisc.edu/

Background and Rationale • The Age of Biodiversity Informatics • Scope of modern biodiversity facilities • A brief history of the PaleoDB • The scientific question it sought to answer • The evolution since then

The Age of Biodiversity Informatics • Biodiversity Informatics: An emerging discipline in the broader field of Bioinformatics aiming at information capture, storage, retrieval, and analysis of biodiversity data • The Age: Biodiversity research with increasing worldwide attention and funding especially for large-scale approaches

Biodiversity Initiatives • National biodiversity centers being established worldwide, usually highly interdisciplinary • Science driven • Discovery/outreach driven • Policy driven • International consortia • Infrastructure: GBIF, OBIS • Policy: Intergovernmental Platform of Biodiversity & Ecosystem Services(http://www.ipbes.net) • Where is Paleo?

GBIF and Allies • The Global Biodiversity Information Facility (GBIF) was founded in 2001 • Mission: facilitate free and open access to biodiversity data worldwide, via the Internet, to underpin sustainable development Priorities: • Mobilising biodiversity data • Developing protocols and standards • Building an informatics architecture www.gbif.org

271∙106 georeferenced data available GBIF promotes data-sharing with countries of origin.

Use of GBIF data Predict biotic effects of climate change Analyse and predict spread of pests and diseases of humans, crops, livestock, wildlife, etc. Predict best places to set up new protected areas Analyse invasive species and predict invasion pathways Provide policy-maker-relevant data of all kinds Be a resource for biodiversity science communities

Paleo to be Integrated at Multiple Scales • Short time scales: Natural baselines, ecological consequences of climate change  Conservation Palaeobiology • Long time scales: General principles of biodiversity regulation, response to extreme events  Analytical Palaeobiology

The Paleobiology Database: A Core Infrastructure for the Biogeosciences • Founded in 2000, funded by NSF (2000-2005, 2010-) and other sources • Driven by a scientific question • Was the rise of marine biodiversity in the last 200 myr as dramatic as suggested by compendia of stratigraphic ranges? • Collect occurrence data, apply sampling standardization and use fossil data only http://paleodb.org

Phanerozoic Marine Animal Diversity Exponential post-Paleozoic rise? Data from Sepkoski (2002, Bull. Am. Pal.)

What is wrong with Sepkoski? • Data are just times of first and last appearances in the record (genera and families) • No way to standardize for sampling • Extreme effect of the Pull of the Recent

New Logistic post-Triassic rise Alroy et al. (2008, Science) Marine Biodiversity Through Time Old Exponential post-Paleozoic rise Data from Sepkoski (2002, Bull. Am. Pal.)

Structure of Compendia Corals and bivalves from Sepkoski‘s compendium of marine animal genera (2002)

Evolution of the PaleoDB: New Horizons • Biogeographic Questions • Implementation of Scotese’s plate tectonic reconstructions • Extending taxonomic/environmental scope • Vertebrate, paleobotany, and micropaleontology research groups • Link to Neptune Database (Ocean Drilling) • Beyond Diversity • Communities over time • Environmental preferences • Geodisparity • Body-size distributions • Geological Drivers

Basic Features of the PBDB • Organization • Structure • Finding data • Drawing maps • Downloading data

Organization • Database Coordinator: John Alroy (Macquarie University) • Informal core group running mirror servers (3 persons) • Data Management Committee (10) • Data Contributors: Professional scientists (usually with PhD) (132) • Data Enterers: Contributors and students (310)

The Structure • Basic information is the occurrence of a particular taxon (species, genus or higher) in a particular collection (i.e. sample or outcrop …) • References linked to occurrences and collections • Geographic and geologic context stored with each collection • Taxa classified according to multiple opinions (synonymies, re-identifications)

Finding Data • Generate data summary tables • Menu: Analyze • Task: Marine Invertebrate Collections by Geological Period • Find collections • Menu: Full search – Fossil collection records • Task: Find all collections containing lithistid sponges (Lithistida) in Germany • Find taxa • Menu: Full search – Fossil organisms • Task: Get the full synonymy list of Brachiosaurus brancai

Drawing Maps • Draw fossil collections on a plate tectonic reconstruction of the appropriate age • Menu: Analyze • Tasks: 1. Get a map of Jurassic reefs in a Mollweide projection. 2. Identify the westernmost reef and get a list of fossils

Downloading Data • The most important step for further analyses • Virtually all data in the PaleoDB are open access • Downloads in csv format can be read by almost any program • Menu: Download • Task: Download all occurrences of Triassic sponges with coordinates/paleocoordinates, stage-level resolution and full taxonomic information

Playtime + Break

Advanced Features • Ecological metrics of collections • Diversity and others • Confidence intervals of stratigraphic ranges • Within sections and global • Diversity curve generator • Raw and sampling standardized

Ecological Metrics • Get alpha diversity and ecological data from a collection • Menu: Analyze abundance data • Task: Get the metrics of a Triassic community from China (e.g. collection #31618) and look feeding modes

Background of Diversity Metrics Which community is more diverse?

Measuring Alpha Diversity • Shannon-Wiener Information Index (H) • H = -∑ pi x ln(pi) • pi= Proportion of the ith species in community • Mixed signal of richness and evenness • Evenness (J) • Evenness J = H/Hmax • Hmax = ln(S)

Rarefaction • Which species richness would I observe if my sample A was smaller than it is (e.g., as small as sample B) • Mathematic solution: • Empirical solution: • Let the computer draw specimens at random and get diversity for a given sample size

Confidence Intervals of Stratigraphic Ranges • The first and last observations of a taxon in the fossil record must be younger and older than its time or origination and extinction, respectively • By how much? • Quantifying uncertainties within sections and globally

Draw a Stratigraphic Section • Menu: Analyze stratigraphic sections • Task: Try the Bangtoupo section in China

Using the fossil record for molecular clocks • Calibration: Estimate the branching points of two sister groups • Menu: Analyze – Calculate a first appearance • Task: Branching point between Acropora and Montipora

Diversity Through Time • Theoretical Background • Counting methods • Sampling issues • Sampling standardization • Hands on with R

Counting Diversity Through Time A Through ranging B Through ranging Extinct C Originating D Singleton E

Measuring Diversity A Through ranging B Extinct Originating C Singleton D Through ranging E Boundary crossers: 3 Range through: 5 Range through minus singletons: 4 Boundary crossers Range through

Measuring Diversity Through Time Draw Diversity Curves: SIB, range through, range through minus singletons, boundary crossers

2 or 3 Perhaps 2 Sure 2 Rarefaction (3) 2,23 1,89 2 Sampling Standardization of Time Series Data This sufficient for sampled in bin (SIB) diversity, but silent on extinctions

Diversity Over Time Omit Singletons S = 2, Ext = 0 S = 2, Ext = 0 S = 2, Ext = 2 S = 1, Ext = 0 S = 1, Ext = 0 S = 1, Ext = 1 S = 2, Ext = 0 S = 3, Ext = 1 S = 2, Ext = 2 S = 1.67, Ext = 0 S = 1.67, Ext = 0.33 S = 1.67, Ext = 1.67

Subsampling Methods • Classical Rarefaction • Pool all occurrence data • Randomly draw data until quota is reached • Occurrences weigthed by-list subsampling (OW) • Pool occurrences by collections • Randomly draw collections until quota of occurrences is reached • Unweighted by-list subsampling (UW) • Pool collections • Randomly draw collections until quota of collections is reached • Occurrences-exponentiated weighted by-list subsampling (OexpW) • Pool occurrences by collections • Randomly draw collections until weighted quota of occurrences is reached • Shareholder Quorum Method • Sampling until a particular proportion (quorum) of the rank-abundance distribution has been sampled

Why so many? • Rarefaction assumes that differences in diversity are due to sampling • We might lose biological signal by attempting to sampling-standardize if we don’t consider evenness • If evenness of communities is different, then rarefaction will mostly reflect these differences • The best subsampling method has to consider several biases

Lunch

Hands-On with the PaleoDB • Create a subsampled diversity curve with the online scripts • Download a dataset and use the function: Generate diversity curve data

Analyze Downloaded Data with R • Open R • Run the script PBDB_analyze.R

Occurrence Data Now and Then

We Want You! The Paleobiology Database is from the community for the community Data quantity and quality need to be improved to increase rigor and scope of analyses Many important questions are yet to be addressed http://paleodb.org

How to Enter Data • Give it a try • testpaleodb.geology.wisc.edu • Login as Contributor: • Authorizer: User60x, T. • Enterer: User60x, T. • Password: Berlin

The Paleobiology Database (PaleoDB, www.paleodb.org) has been rapidly developing into a core infrastructure for palaeontology. The participation of 289 contributors from 22 countries made it possible that the PaleoDB now holds taxonomic and distributional information on 217,000 taxa and more than one million fossil occurrences. With 150 official publications, the scientific output is impressive, but could be improved if more colleagues would learn how to make use of the database for their own research.The purpose of this course is thus to familiarize paleontologists with the structure and scope of the PaleoDB and to introduce them to its analytical tools that are available online. Examples will be provided for paleo-community analysis, confidence intervals on stratigraphic ranges, and global and regional diversity patterns. Basic statistical concepts will be explained briefly, but the focus is on practicing with the database.

The Paleobiology Database

The Paleobiology Database

Presentation Transcript

The STRING database

The Student Database

THE DATABASE ENVIRONMENT

Paleobiology and Macroevolution

Paleobiology : Where An Online Module Meets Best Practice

Building The Database

The Database

The ShaleNetwork Database

Paleobiology and Macroevolution Ch. 22.1-22.5

The Spectrum Database

The Database Group

DINOSAUR PALEOBIOLOGY

The DoOP database

The Ontology of Paleobiology

The Employee Database

The OSIRIS database

The database

The DANSTEN database

The SUN Database

The IntAct Database

The whois Database

Database Systems The Relational Database Model