110 likes | 243 Views
Goals of the Infrastructure Team. Patrick Leary. Core EOL Activities. Identify sources of biological content Acquire content or metadata about content Create a central biological index Generate web pages aggregating content related to a particular taxon
E N D
Goals of the Infrastructure Team Patrick Leary
Core EOL Activities • Identify sources of biological content • Acquire content or metadata about content • Create a central biological index • Generate web pages aggregating content related to a particular taxon • Provide users with content search and retrieval
Core Infrastructure Needs • Mobilizing content • Develop or improve standards • Create ‘connectors’ to map different data models • Tools for easy content provider initialization • Software to ingest content into EOL databases • Name-finding tools to identify relevant BHL pages • Tap into existing content stores (Flickr, Wikimedia)
Content Partners (e.g. FishBase, Tree of Life, LifeDesks, …) APIs, Excel, Export XML Schema Processing EOL Databases Content Cache Rails Models and Controllers APIs
Content Partner Registry • Maintain and improve registry • Ensure regularly-scheduled harvests • Work with Species Pages Group
Biological Index • Evolving names-based infrastructure • Proper handling of names maximizes the quality of the index • Names and hierarchies are the cornerstone of this biological index • Addressing the problem of scale • 12 million names • 9.5 million taxonomic assertions • 1 million published data objects • 18 million species references in BHL • 20 million verified out links
Making Order Of The Mess • Group related names • Lexical Groups • Pomatomussaltatrix • Pomatomussaltatrix(Linnaeus, 1766) • Pomatomussaltator(Linnaeus, 1766) • Pomatomussaltratrix • Nomenclatural Groups • GasterosteussaltatrixLinnaeus, 1766 • Temnodonsaltator(Linnaeus, 1766) • Pomatomussaltatrix(Linnaeus, 1766) • Common Names • Bluefish (Pomatomussaltatrix) • Skipjack (Pomatomussaltator(Linnaeus, 1766)) • Çinekopbalığı (GasterosteussaltatrixLinnaeus, 1766)
Multiple Hierarchies A B A B A B B
Next Steps • Reduce impediments for contributors • Integrate more existing content stores such as You Tube or Wikipedia • Continue to improve name and concept reconciliation • Provide names and hierarchy editing interfaces for curators • Improve names finding tools for Biodiversity Heritage Library (BHL) • Atomized descriptive data