270 likes | 390 Views
A Toolkit for Reconciling Multiple Taxonomic Perspectives: Euler/X and the Perelleschus Use Case. Nico Franz 1 , Mingmin Chen 2 , Shizhuo Yu 2 , Shawn Bowers 3 & Bertram Lud äscher 2 1 School of Life Sciences, Arizona State University 2 Department of Computer Science, UC Davis
E N D
A Toolkit for Reconciling Multiple Taxonomic Perspectives: Euler/X and the Perelleschus Use Case Nico Franz1, Mingmin Chen2, Shizhuo Yu2, Shawn Bowers3 & Bertram Ludäscher2 1 School of Life Sciences, Arizona State University 2 Department of Computer Science, UC Davis 3 Department of Computer Science, Gonzaga University TDWD 2013 Annual Conference, Florence, Italy Semantics for Biodiversity – Formal Models and Ontologies November 01, 2013 Slides @ http://taxonbytes.org/tdwg-2013-a-toolkit-for-reconciling-multiple-taxonomic-perspectives
Introduction – the Euler project & Euler/X toolkit • The project builds on a ~ 25 year history of using taxonomic concepts in the TDWG community; primarily in Australia, Germany, United Kingdom, Japan. • Prior extensive uses of concept articulations include Koperskiet al. (2000); and concatenation of articulations by Berendsohn, Geoffroy & Güntsch (2003). Homepage: https://sites.google.com/site/eulerdi/home Open source: https://bitbucket.org/eulerx/euler-project Overview paper: http://taxonbytes.org/pdf/ChenEtAl2013-EulerToolkit.pdf
Introduction – the Euler project & Euler/X toolkit • The project builds on a ~ 25 year history of using taxonomic concepts in the TDWG community; primarily in Australia, Germany, United Kingdom, Japan. • Prior extensive uses of concept articulations include Koperskiet al. (2000); and concatenation of articulations by Berendsohn, Geoffroy & Güntsch (2003). • David Thau's (2006-2010) work on CleanTax prototyped the use of RCC-5 relations in combination for First-Order Logic reasoning over taxonomies. • The Euler project (2011-) succeeds CleanTax, with performance optimizations, many added functions, and an increasing focus on Answer Set Programming. Homepage: https://sites.google.com/site/eulerdi/home Open source: https://bitbucket.org/eulerx/euler-project Overview paper: http://taxonbytes.org/pdf/ChenEtAl2013-EulerToolkit.pdf
Review: RCC-5 articulations between two concepts C1, C2 proper inclusion congruence inverse proper inclusion overlap Use of "OR" to express uncertainty. Example: C1 == OR > C2 exclusion Source: Franz & Peet. 2009. Towards a language for mapping relationships among taxonomic concepts. Systematics and Biodiversity 7: 5–20.
Interactive taxonomy alignment: Euler/X toolkit workflow • Challenge:asserting articulations across 2 taxonomies may lead to ambiguities, inconsistencies, and omissions, resulting in an imperfect alignment.
Interactive taxonomy alignment: Euler/X toolkit workflow • Challenge:asserting articulations across 2 taxonomies may lead to ambiguities, inconsistencies, and omissions, resulting in an imperfect alignment. • Solution:Euler/X reads in 2 concept taxonomies (TCs + T1 + T2) plus a set of initial, expert-made articulations (A). The toolkit then allows for:
Interactive taxonomy alignment: Euler/X toolkit workflow • Challenge:asserting articulations across 2 taxonomies may lead to ambiguities, inconsistencies, and omissions, resulting in an imperfect alignment. • Solution:Euler/X reads in 2 concept taxonomies (TCs + T1 + T2) plus a set of initial, expert-made articulations (A). The toolkit then allows for: • Checking for, and identification of, alignment inconsistencies.
Interactive taxonomy alignment: Euler/X toolkit workflow • Challenge:asserting articulations across 2 taxonomies may lead to ambiguities, inconsistencies, and omissions, resulting in an imperfect alignment. • Solution:Euler/X reads in 2 concept taxonomies (TCs + T1 + T2) plus a set of initial, expert-made articulations (A). The toolkit then allows for: • Checking for, and identification of, alignment inconsistencies. • Interactive inconsistency repair.
Interactive taxonomy alignment: Euler/X toolkit workflow • Challenge:asserting articulations across 2 taxonomies may lead to ambiguities, inconsistencies, and omissions, resulting in an imperfect alignment. • Solution:Euler/X reads in 2 concept taxonomies (TCs + T1 + T2) plus a set of initial, expert-made articulations (A). The toolkit then allows for: • Checking for, and identification of, alignment inconsistencies. • Interactive inconsistency repair. • Generation of the set of mir– maximally informative relations (necessary and sufficient to yield a complete alignment).
Interactive taxonomy alignment: Euler/X toolkit workflow • Challenge:asserting articulations across 2 taxonomies may lead to ambiguities, inconsistencies, and omissions, resulting in an imperfect alignment. • Solution:Euler/X reads in 2 concept taxonomies (TCs + T1 + T2) plus a set of initial, expert-made articulations (A). The toolkit then allows for: • Checking for, and identification of, alignmentinconsistencies. • Interactive inconsistency repair. • Generation of the set of mir– maximally informative relations (necessary and sufficient to yield a complete alignment). • Interactive uncertainty reduction.
Interactive taxonomy alignment: Euler/X toolkit workflow • Challenge:asserting articulations across 2 taxonomies may lead to ambiguities, inconsistencies, and omissions, resulting in an imperfect alignment. • Solution:Euler/X reads in 2 concept taxonomies (TCs + T1 + T2) plus a set of initial, expert-made articulations (A). The toolkit then allows for: • Checking for, and identification of, alignment inconsistencies. • Interactive inconsistency repair. • Generation of the set of mir– maximally informative relations (necessary and sufficient to yield a complete alignment). • Interactive uncertainty reduction. • Visualization of one or more "Possible World" merge taxonomies.
Euler/X is ready1 for real-life use cases – Perelleschus 1 After many iterations of testing/optimization with abstract cases, PW visualizations, and reasoner benchmarking.
Perelleschus use case – overview of 6 classifications/phylogenies 1936 1954 = "carludovicae" (name), cumulative history 1986 2001 2006 2013
Key properties of the Perelleschus concept history use case • 6 classifications (3 taxonomic, 3 phylogenetic), 54 concepts, from 1936 to 2013 • Complete concept history from 1st concept E. carludovicaesec. Günther (1936) to current phylogenetic arrangement (2013) with 10 species-level concepts. • All instances of taxonomic incongruence occur above the species level. DOI:10.1080/14772000.2013.806371 (link)
Key properties of the Perelleschus concept history use case • 6 classifications (3 taxonomic, 3 phylogenetic), 54 concepts, from 1936 to 2013 • Complete concept history from 1st concept E. carludovicaesec. Günther (1936) to current phylogenetic arrangement (2013) with 10 species-level concepts. • All instances of taxonomic incongruence occur above the species level. • Franz & Cardona-D. (2013) provide 54 concepts + Trees 1-6 + 76 articulations. • Only 5 of 54 higher-level concept articulations are unambiguously congruent. • Articulations take into account membership & diagnostic features. DOI:10.1080/14772000.2013.806371 (link)
Concept evolution – Günther (1936) to Voss (1954) Reconciliation appears easy enough; except E. carludovicaesec. Günther (1936; [2]) – a Costa Rican taxon/concept – was placed in Elleschus sec. Günther (1936; [1]) – a European taxon/concept with several other children which the author omitted in his 1936 treatment (issue: incomplete listing of children).
Concept evolution – Günther (1936) to Voss (1954) Reconciliation appears easy enough; except E. carludovicaesec. Günther (1936; [2]) – a Costa Rican taxon/concept – was placed in Elleschus sec. Günther (1936; [1]) – a European taxon/concept with several other children which the author omitted in his 1936 treatment (issue: incomplete listing of children). Thus "overlap" (><) is an intuitive articulation among [1] and [3]; however Euler/X would not infer this unless we either: Relax the "coverage assumption" for [1] (coverage means that a parent's extension is fully defined by its children); or Add a child "1 Imp" (implied) to obtain the proper mir and merge.
Concept evolution – Günther (1936) to Voss (1954) Once "1 Imp" is added, Euler/X yields a consistent merge that is intuitive at all levels. Euler/X mir Euler/X merge Color legend 1936 concepts 1.1 Imp 1954 concepts Congruent species concepts '36/'54 Overlap (><)
Concept evolution – Wibmer & O'Brien (1986) to Franz & O'Brien (2001) Euler/X merge Color legend 1986 2001 Congr. '86/'01 >< Euler/X infers a consistent and plausible merge of the 1986 three-species taxonomy and the eight-species 2001 phylogeny.
Concept evolution – Wibmer & O'Brien (1986) to Franz & O'Brien (2001) Euler/X merge Color legend 1986 2001 Congr. '86/'01 >< The overlap (><) articulations among 2001 higher-level concepts [14,16,20,…] and Perelleschus sec. W. & O. 1986 [7] are rooted in the inclusion/exclusion of "subcinctus" [10/13] in "Perelleschus" [7/14].
Concept evolution – Wibmer & O'Brien (1986) to Franz & O'Brien (2001) Euler/X merge Color legend 1986 2001 Congr. '86/'01 >< The 2001 authors transferred "subcinctus" into Phyllotrox [12].
Concept evolution – Franz & O'Brien (2001) to Franz & Cardona-D. (2013) At the surface and beyond, the two phylogenies share many congruent terminals and seemingly also higher-level entities. However, the 2013 treatment includes two new species/concepts [53,54] and one new clade [52]nestedwell within the genus-level topology.
Concept evolution – Franz & O'Brien (2001) to Franz & Cardona-D. (2013) Once the outroups were "stipulated" as congruent and "sealed off" (through application of coverage) from the ingroups, the merge got solidified and simplified. Initial merge results: "noisy" due in part because of divergent outgroup assumptions. Outgroups too much "noise" 2013: Phyllotrogina Unwanted overlap??? 2001: Derelomini out of position Main 2001 higher-level trunk 38 = 2013: Perelleschus Main 2013 higher-level trunk 14 = 2001: Perelleschus
Concept evolution – Franz & O'Brien (2001) to Franz & Cardona-D. (2013) 2013 higher-level concepts 2001 higher-level concepts 2013/2001 congruence "Clean" merge with overlapping, parallel 2001/2013 mid-level trunks that reflect the addition of a new, nested 2013 clade. Zoom in on overlap New 2013 clade
In progress – zooming in on overlap, "combined concept" resolution A20 B47 >< 1. Merge view – overlap A20' "AB2047" B47' [3 new labels] 2. Zoom view – 2 levels Level 1: Level 2: "AB2047" A20' B47' A23 A21B45 A22B46 B52
Conclusions & outlook The Euler/X toolkit is moving towards logically sound, interactive, scalable, and visually effective solutions to the challenge of reasoning over concept and classification / phylogeny provenance in real-life use cases. Many agencies and projects aim towards integration of taxonomic names and concepts, including the Global Names Architecture initiative. The Euler concept approach represents a robust and powerful way to achieve this through interactive, semi-automated reasoning and visualization of merge taxonomies.
Acknowledgments • TDWG 2013 Symposium organizers – John Deck, Mark Schildhauer, Ramona Walls • Juliana Cardona-Duque – Universidad de Antioquia, Medellín, Colombia • NSF Award IIS-1118088. "III: Small: A Logic-Based, Provenance-Aware System for Merging Scientific Data under Context and Classification Constraints." https://sites.google.com/site/eulerdi/home https://sols.asu.edu http://taxonbytes.org