180 likes | 321 Views
A Developers Guide To Coprocessors. Hbasecon 2013. John Weatherford https://github.com/jweatherford. Who Is Telescope?. Telescope is the leading provider of interactive television, audience participation and customer engagement solutions.
E N D
A Developers Guide To Coprocessors Hbasecon 2013 John Weatherford https://github.com/jweatherford
Who Is Telescope? Telescopeis the leading provider of interactive television, audience participation and customer engagement solutions. Clients include TV networks, producers, digital platforms, studios, and sponsors seeking to reach, engage, and retain mass-audiences and consumers in real-time.
What Is A Coprocessor Arbitrary code that can run on each server Extendthe functionality of Hbase Avoid bothering the core committers
Two Types of Coprocessors Observers React to an event Run code before or after Endpoints Call a function explicitly Execute code on all regions Client Pre-Action Client Endpoint Endpoint Endpoint Region 1 Region 3 Region 2 Action Post-Action
What Can I Do With Coprocessors Data Aggregation Access Control Ideas what can be done Real Time Analytics Secondary Indexes Email split alerts Optimized Search Cache Request Reduce result sets Control compaction times
A Short Story Nothing ventured Nothing gained
Getting Started With Code preGet(ObserverContext<RegionCoprocessorEnvironment> c, Get get, List<KeyValue> result) postGet(ObserverContext<RegionCoprocessorEnvironment> c, Get get, List<KeyValue> result) prePut(ObserverContext<RegionCoprocessorEnvironment> c, Put put, WALEdit edit, booleanwriteToWAL) postPut(ObserverContext<RegionCoprocessorEnvironment> c, Put put, WALEdit edit, booleanwriteToWAL) preDelete(ObserverContext<RegionCoprocessorEnvironment> c, Delete delete, WALEdit edit, booleanwriteToWAL) postDelete(ObserverContext<RegionCoprocessorEnvironment> c, Delete delete, WALEdit edit, booleanwriteToWAL)
Our First Observer Intercept and modify the action Consider all circumstances that will trigger the observer Compile your jar to the same version of Java running your Hbase Regions Look for output from the coprocessor
Our First Observer Motivation Apache flume only writes one column per put preput() JSON Single Row Put {twitter: { name: “loljk4u”, message: “<3”, length: 2, registered: true }, favorite: { name: “Taylor” ... key: id-1332343 family: twitter qualifier: json_raw value: “{twitter: {name: \“loljk4u\”, message: \“<3\”, length: 2, registered: true ... key: id-1332343 twitter:name: “loljk4u” twitter:message: “<3” twitter:length: 0x2 twitter:registered: 0xFF favorite:name: “Taylor” favorite:song: “I knew you were trouble” put
JsonColumnExpander //get the arguments on the coprocessor public void start(CoprocessorEnvironmentenv) throws IOException { Configuration c = env.getConfiguration(); families = c.get("families", "").split(":"); } public void prePut(ObserverContext<…> e, Put put, WALEdit edit, booleanwaL) { if(!put.has(FAMILY, JSON_COLUMN)) { return; } //check for the json_raw column String json = Bytes.toString(put.get(FAMILY, JSON_COLUMN).get(0).getValue()); for(Entry<String, ?> column : columns.entrySet()) { //loop through the json String value = (String) column.getValue(); put.add(family, Bytes.toBytes(column.getKey()), Bytes.toBytes(value)); } //remove the original json from the put put.add(FAMILY, JSON_COLUMN, "--removed--".getBytes()); }
Loading the Coprocessor Push the jar to where your cluster can find it $>hadoopfs –put JsonColumnExpander.jar / Alter the table to enable the coprocessor $> alter ‘test', METHOD => 'table_att', 'coprocessor'=>'hdfs:///JsonColumnExpander.jar|telescope.hbase.JsonColumnExpander|1001|arg1=1,arg2=2‘ Verify the load by checking the master web UI.
Running The Code Trigger the coprocessor with a put on the table Put put = new Put(“rowkey”); Put.add(“goat”.toBytes(), “json_raw”.toBytes(), json_data); Checkeach server’s local logs http://regionnode:60030/logs/ hbase-hbase-regionserver-node2. dev-hadoop.telescope.tv.out
Creating Your First Endpoint Define the available methods a protocol Implement the protocol ExtendBaseRegionEndpoint Load the endpoint on the table
Endpoint Example public interface TrendsProtocol extends CoprocessorProtocol{ HashMap<String, Long> getData() throws IOException; } //The endpoint class implements the protocol we wrote above public class TrendsEndpoint extends BaseEndpointCoprocessor implements TrendsProtocol{ @Override public HashMap<String, Long> getTrends() throws IOException { RegionCoprocessorEnvironmentenvironment = getEnvironment(); InternalScannerscanner = environment.getRegion().getScanner(s); try { List<KeyValue> curVals = new ArrayList<KeyValue>(); do { curVals.clear(); for(KeyValuepair : curVals){ //loop through values on the region and process } }while(!done); } } }
Endpoint Returned Results htable = HBaseDB.getTable(connection, “hbase_demo"); Map<byte[], HashMap<String, Long>> results = null; results = m_analytics.coprocessorExec( TrendsProtocol.class, null, //start row null, //end row new Batch.Call<TrendsProtocol, HashMap<String, Long>>(){ @Override public HashMap<String, Long> call(TrendsProtocol trends)throws IOException { return trends.getData(); } } ); for (Map.Entry<byte[], Boolean> entry : results.entrySet()) { //process results from each region server }
Addendum to Endpoints 0.96 is changing Endpoints to use protobuf public static abstract class RowCountService implements com.google.protobuf.Service { ... public interface Interface { public abstract void getRowCount( com.google.protobuf.RpcController controller, CountRequest request, com.google.protobuf.RpcCallback done); public abstract void getKeyValueCount( com.google.protobuf.RpcController controller, CountRequest request, com.google.protobuf.RpcCallback done); } }
Telescope’s Coprocessors Observers collect real time analytics data for our moderation platform as well as to create aggregate tables for the steaming data Endpoints optimize searches and transmit only the necessary data. Perform simple reporting queries that don’t need the full power of mapreduce.
Questions? Alreadyusing coprocessors? I would love to hear about it. Curious to know more about a specific part? All code samples and table definitions can be found at https://github.com/jweatherford