1 / 54

Secondary Indexing in Phoenix

Secondary Indexing in Phoenix. LA HBase User Group – September 4, 2013 . Jesse Yates HBase Committer Software Engineer. Agenda. https://www.madison.k12.wi.us/calendars. About Other Indexing Frameworks Immutable Indexes Mutable Indexes in Phoenix Mutable Indexing Internals Roadmap.

jenaya
Download Presentation

Secondary Indexing in Phoenix

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Secondary Indexing in Phoenix LA HBase User Group – September 4, 2013 Jesse Yates HBase Committer Software Engineer

  2. Agenda https://www.madison.k12.wi.us/calendars About Other Indexing Frameworks Immutable Indexes Mutable Indexes in Phoenix Mutable Indexing Internals Roadmap LA HUG – Sept 2013

  3. About me • Developer at Salesforce • System of Record, Phoenix • Open Source • Phoenix • HBase • Accumulo LA HUG – Sept 2013

  4. Phoenix • Open Source • https://github.com/forcedotcom/phoenix • “SQL-skin” on HBase • Everyone knows SQL! • JDBC Driver • Plug-and-play • Faster than HBase • in some cases LA HUG – Sept 2013

  5. Why Index? HBase is only sorted on 1 “axis” Great for search via a single pattern Example! LA HUG – Sept 2013

  6. Example name: type: subtype: date: major: minor: quantity: LA HUG – Sept 2013

  7. Secondary Indexes Sort on ‘orthogonal’ axis Save full-table scan Expected database feature Hard in HBase b/c of ACID considerations LA HUG – Sept 2013

  8. Agenda About Other Indexing Frameworks Immutable Indexes Mutable Indexes in Phoenix Mutable Indexing Internals Roadmap LA HUG – Sept 2013

  9. http://www.wired.com/wiredenterprise/2011/10/microsoft-and-hadoop/http://www.wired.com/wiredenterprise/2011/10/microsoft-and-hadoop/ LA HUG – Sept 2013

  10. Other (Major) Indexing Frameworks • HBase SEP • Side-Effects Processor • Replication-based • https://github.com/NGDATA/hbase-sep • Huawei • Server-local indexes • Buddy regions • https://github.com/Huawei-Hadoop/hindex LA HUG – Sept 2013

  11. Agenda About Other Indexing Frameworks Immutable Indexes Mutable Indexes in Phoenix Mutable Indexing Internals Roadmap LA HUG – Sept 2013

  12. Immutable Indexes Immutable Rows Much easier to implement Client-managed Bulk-loadable LA HUG – Sept 2013

  13. Bulk Loading phoenix-hbase.blogspot.com LA HUG – Sept 2013

  14. Index Bulk Loading Identity Mapper Custom Phoenix Reducer HFile Output Format LA HUG – Sept 2013

  15. Index Bulk Loading PreparedStatement statement = conn.prepareStatement(dmlStatement); statement.execute(); String upsertStmt = "upsert into core.entity_history(organization_id,key_prefix,entity_history_id, created_by, created_date)\n" + "values(?,?,?,?,?)"; statement = conn.prepareStatement(upsertStmt); … //set values Iterator<Pair<byte[],List<KeyValue>>> dataIterator = PhoenixRuntime.getUncommittedDataIterator(conn); LA HUG – Sept 2013

  16. Agenda About Other Indexing Frameworks Immutable Indexes Mutable Indexes in Phoenix Mutable Indexing Internals Roadmap LA HUG – Sept 2013

  17. The “fun” stuff… LA HUG – Sept 2013

  18. 1.5 years LA HUG – Sept 2013

  19. Mutable Indexes • Global Index • Change row state • Common use-case • “expected” implementation • Covered Columns LA HUG – Sept 2013

  20. Usage Just SQL! Baby name popularity Mock demo LA HUG – Sept 2013

  21. Usage Selects the most popular name for a given year SELECT name,occurrences FROM baby_names WHERE year=2012 LIMIT 1; Selects the total occurrences of a given name across all years SELECT /*+ NO_INDEX */ name,sum(occurrences) FROM baby_names WHERE name='Jesse' GROUP BY name; Selects the total occurrences of a given name across all years allowing an index to be used SELECT name,sum(occurrences) FROM baby_names WHERE name='Jesse' GROUP BY NAME; LA HUG – Sept 2013

  22. Usage • Update rows due to census inaccuracy • Will only work if the mutable indexing is working UPSERT INTO baby_names SELECT year,occurrences+3000,sex,name FROM baby_names WHERE name='Jesse'; • Selects the now updated data (from the index table) SELECT name,sum(occurrences) FROM baby_names WHERE name='Jesse' GROUP BY NAME; • Index table still used in scans EXPLAIN SELECT name,sum(occurrences) FROM baby_names WHERE name='Jesse' GROUP BY NAME; LA HUG – Sept 2013

  23. Agenda About Other Indexing Frameworks Immutable Indexes Mutable Indexes in Phoenix Mutable Indexing Internals Roadmap LA HUG – Sept 2013

  24. Internals • Index Management • Build index updates • Ensures index is ‘cleaned up’ • Recovery Mechanism • Ensures index updates are “ACID” LA HUG – Sept 2013

  25. “There is no magic” - Every programming hipster (chipster) LA HUG – Sept 2013

  26. Mutable Indexing: Standard Write Path Client HRegion RegionCoprocessorHost WAL RegionCoprocessorHost MemStore LA HUG – Sept 2013

  27. Mutable Indexing: Standard Write Path Client HRegion RegionCoprocessorHost WAL RegionCoprocessorHost MemStore LA HUG – Sept 2013

  28. Mutable Indexing Indexer Region Coprocessor Host Builder Codec WAL Updater WAL Durable! Index Table Indexer Region Coprocessor Host Index Table Index Table LA HUG – Sept 2013

  29. Index Management public interface IndexBuilder{ public void setup(RegionCoprocessorEnvironmentenv); public Map<Mutation, String> getIndexUpdate(Put put); public Map<Mutation, String> getIndexUpdate(Deletedelete); } Lives within a RegionCoprocesorObserver Access to the local HRegion Specifies the mutations to apply to the index tables LA HUG – Sept 2013

  30. Why not write my own? • Managing Cleanup • Efficient point-in-time correctness • Performance tricks • Abstract access to HRegion • Minimal network hops • Sorting correctness • Phoenix typing ensures correct index sorting LA HUG – Sept 2013

  31. Example: Managing Cleanup • Updates can arrive out of order • Client-managed timestamps LA HUG – Sept 2013

  32. Example: Managing Cleanup Index Table LA HUG – Sept 2013

  33. Example: Managing Cleanup LA HUG – Sept 2013

  34. Example: Managing Cleanup LA HUG – Sept 2013

  35. Example: Managing Cleanup LA HUG – Sept 2013

  36. Example: Managing Cleanup LA HUG – Sept 2013

  37. Managing Cleanup History “roll up” Out-of-order Updates Point-in-time correctness Multiple Timestamps per Mutation Delete vs. DeleteColumn vs. DeleteFamily Surprisingly hard! LA HUG – Sept 2013

  38. Phoenix Index Builder public interfaceIndexCodec{ public void initialize(RegionCoprocessorEnvironmentenv); public Iterable<IndexUpdate> getIndexDeletes(TableState state; public Iterable<IndexUpdate> getIndexUpserts(TableState state); } Much simpler than full index management Hides cleanup considerations Abstracted access to local state LA HUG – Sept 2013

  39. Phoenix Index Codec LA HUG – Sept 2013

  40. Dude, where’s my data? Ensuring Correctness LA HUG – Sept 2013

  41. HBase ACID • Does NOT give you: • Cross-row consistency • Cross-table consistency • Does give you: • Durable data on success • Visibility on success without partial rows LA HUG – Sept 2013

  42. Key Observation “Secondary indexing is inherently an easier problem than full transactions… secondary index updates are idempotent.” - Lars Hofhansl LA HUG – Sept 2013

  43. Idempotent Index Updates • Doesn’t need full transactions • Replay as many times as needed • Can tolerate a little lag • As long as we get the order right LA HUG – Sept 2013

  44. Failure Recovery • <property> • <name>hbase.regionserver.wal.codec</name> <value>o.a.h.hbase.regionserver.wal.IndexedWALEditCodec</value> • </property> • <property> • <name>hbase.regionserver.hlog.reader.impl</name> • <value>o.a.h.hbase.regionserver.wal.IndexedHLogReader</value> • </property> • Custom WALEditCodec • Encodes index updates • Supports compressed WAL • Custom WAL Reader • Replay index updates from WAL LA HUG – Sept 2013

  45. Failure Situations Any time before WAL, client replay Any time after WAL, HBase replay All-or-nothing LA HUG – Sept 2013

  46. Failure #1: Before WAL Client HRegion RegionCoprocessorHost WAL RegionCoprocessorHost MemStore LA HUG – Sept 2013

  47. Failure #1: Before WAL Client HRegion RegionCoprocessorHost WAL No problem! No data is stored in the WAL, client just retries entire update. RegionCoprocessorHost MemStore LA HUG – Sept 2013

  48. Failure #2: After WAL Client HRegion RegionCoprocessorHost WAL RegionCoprocessorHost MemStore LA HUG – Sept 2013

  49. Failure #2: After WAL Client HRegion RegionCoprocessorHost WAL WAL replayed via usual replay mechanisms RegionCoprocessorHost MemStore LA HUG – Sept 2013

  50. Agenda About Other Indexing Frameworks Immutable Indexes Mutable Indexes Roadmap LA HUG – Sept 2013

More Related