1 / 49

Having Your Cake and Eating It Too

Having Your Cake and Eating It Too. With Apache OODT and Apache Solr. Andrew F. Hart Paul M. Ramirez. About Myself…. Software Engineer NASA Jet Propulsion Laboratory “Data Management” Committer: OODT, SIS, Gora, Streams (Incubating) Mentor: Streams (Incubating). What We’ll Cover.

ardice
Download Presentation

Having Your Cake and Eating It Too

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Having Your Cake and Eating It Too With Apache OODT and Apache Solr Andrew F. Hart Paul M. Ramirez

  2. About Myself… • Software Engineer • NASA Jet Propulsion Laboratory • “Data Management” • Committer: • OODT, SIS, Gora, Streams (Incubating) • Mentor: Streams (Incubating)

  3. What We’ll Cover Overview of OODT & Solr Projects Strategies for Combining OODT and Solr Detailed Deployment/Config. Example Where to Learn More & Participate

  4. Apache OODT • Object Oriented Data Technology • Origin in NASA mission data systems • Components for • Information integration • Data cataloging and archiving • Configurable workflow processing

  5. Apache OODT • OODT @ Apache • Incubation: 2010, Graduation: 2011 • 29 Committers • Latest Release: 0.5 (Dec. 26, 2012)

  6. Apache OODT Karoo Array Telescope (KAT-7)

  7. Apache OODT Virtual Pediatric Intensive Care Unit

  8. Apache OODT Regional Climate Model Evaluation System

  9. Apache OODT • Commonalities between systems • Lots of data • Defined processing steps / algorithms • Archives important (… search important)

  10. Apache OODT • Strengths of OODT for the above use cases • Loosely coupled components • Standard protocols, well-defined interfaces • Highly configurable • Vetted, reliable code

  11. Apache Solr • Search + Web Services • Powerful features • Flexible formats • Highly configurable

  12. Apache Solr The White House

  13. Apache Solr Netflix

  14. Apache Solr NASA Planetary Data System

  15. OODT & Solr • Why use these projects together? • Archives often need search capability • Similarities / Compatibilities • XML-based configuration • Environment (Java, Tomcat)

  16. Example Integration “Standard” Data Archive Pipeline

  17. Example Integration “Standard” Data Archive Pipeline + Search

  18. OODT Products • Typically 1-1 with Files • Each uniquely identifiable (GUID) • Support for higher-level “ProductType” • A way to define collections

  19. OODT Metadata • Annotations for products • Key:{Val|Multival} • Common across all OODT components • Two general classes: • System • User

  20. OODT Metadata • System Metadata • Added automatically by OODT Components • Used to track state • Used to encode relationships between data

  21. OODT Metadata • User Metadata • Specified as “policy” • Can be product-level, or productType-level • Used to extract & persist information from files as they are ingested (become products)

  22. OODT Metadata Metadata (Policy) Example (external)

  23. Solr Schema • XML document • Define what will be indexed (“Fields”) • Provide high-level context hints • Data type, behavior, pre-processing • Extremely flexible, extensible

  24. Solr Schema Solr Schema Example (external)

  25. Making the Connection • SolrIndexer Tool • Part of the File Manager component tools • Map OODT Metadata to Solr Fields • Create Solr documents from OODT products • Note: only talking about metadata

  26. SolrIndexer Tool Org.Apache.Oodt.Cas.Filemgr.Tools Available since 0.4 Release Recommend to use 0.5+ as some stability improvements were added Several modes of operation

  27. SolrIndexer Tool

  28. SolrIndexerTool java -DSOLR_INDEXER_CONFIG=/path/to/indexer.properties \ -Djava.ext.dirs=/path/to/cas/filemgr/lib/ \ org.apache.oodt.cas.filemgr.tools.SolrIndexer \ --all \ --fmUrl http://localhost:9000 \ --solrUrl http://localhost:8080/solr Invocation Examples: Ingest all products from the specified File Manager instance

  29. SolrIndexerTool java -DSOLR_INDEXER_CONFIG=/path/to/indexer.properties \ -Djava.ext.dirs=/path/to/cas/filemgr/lib/ \ org.apache.oodt.cas.filemgr.tools.SolrIndexer \ --types urn:some:ProductType \ --fmUrl http://localhost:9000 \ --solrUrl http://localhost:8080/solr Invocation Examples: Ingest all products from the specified ProductType(s)

  30. SolrIndexerTool java -DSOLR_INDEXER_CONFIG=/path/to/indexer.properties \ -Djava.ext.dirs=/path/to/cas/filemgr/lib/ \ org.apache.oodt.cas.filemgr.tools.SolrIndexer \ --product 19bcb4b8-7999-11e1-b581-8b771498975d \ [--delete] \ --fmUrl http://localhost:9000 \ --solrUrl http://localhost:8080/solr Invocation Examples: Ingest a single product by its unique product id

  31. SolrIndexerTool java -DSOLR_INDEXER_CONFIG=/path/to/indexer.properties \ -Djava.ext.dirs=/path/to/cas/filemgr/lib/ \ org.apache.oodt.cas.filemgr.tools.SolrIndexer \ --optimize --solrUrl http://localhost:8080/solr Invocation Examples: Force optimization of the Solr index

  32. Indexer.properties Configuration file for the SolrIndexer Specify mapping between OODT product metadata and Solr fields Additional “pre-processing” features

  33. Indexer.properties Example Indexer.properties file (external)

  34. Use Case I Building a searchable data archive “Long-term” / “Lights-out” archive Products & metadata immutable Many NASA mission data systems use this model Want to make it easily searchable

  35. Use Case I “Standard” Data Archive Pipeline + Search

  36. Use Cases II Building an interactively editable, searchable data archive Data and metadata mutable Want to dynamically select product(s) to edit based on metadata

  37. Use Case II Interactively Editable Data Archive Pipeline + Search

  38. Use Case II Interactively Editable Data Archive Pipeline + Search Solr catalog out of sync!

  39. Synchronization • Two ways (at least) to solve this: • Modify the OODT Curator Services • Treat OODT Curator Services as “black box” and write “wrapper” service to invoke Curator Services AND update Solr (via scripted call to SolrIndexer, for example)

  40. Modify Curator Services • Services implemented in JAX-RS • /curator/src/main/java/org/apache/oodt/cas/curation/service • [curator_url]/services/metadata/update • Options: • Utilize Solr Java API • Wrap call to OODT SolrIndexer tool

  41. Use Case II-A Modified Curator Services to Simultaneously update Solr

  42. Example Interactive event tagging

  43. Wrap Curator Services • Curator Service/API is “black box” • Develop custom service that: • Issues POST request to Curator service • Updates Solr index via, e.g.: • Utilize Solr Java API • Wrap call to OODT SolrIndexer tool

  44. Use Case II-B Wrapping OODT Curation Services with Custom UI & Services

  45. Example

  46. Lessons Solr compliments OODT File Manager RESTful interfaces (Solr + OODT Curator) allow for great flexibility in designing services and UI “Best” approach depends on situation

  47. Next Steps • Develop “SolrCatalog” for OODT File Manager? • Pros: Reduction in “moving parts” • Cons: Restrictive? • Implement Use Case II-A as optional mode for Curator web service layer

  48. Learning More • Solr • http://lucene.apache.org/solr • solr-user@lucene.apache.org • OODT • http://oodt.apache.org • https://cwiki.apache.org/confluence/display/OODT/Home • oodt-user@apache.org

  49. Thanks! Questions?

More Related