350 likes | 740 Views
Cassandra DB. Not Only SQL. Table of Content. Background and history Used Applications What is Cassandra? – Overview Replication & Consistency Writing, Reading, Querying and Sorting API’s & Installation World Database in Cassandra Using Hector API Administration tools. Background.
E N D
Cassandra DB Not Only SQL
Table of Content • Background and history • Used Applications • What is Cassandra? – Overview • Replication & Consistency • Writing, Reading, Querying and Sorting • API’s & Installation • World Database in Cassandra • Using Hector API • Administration tools
Background • Influential Technologies: • Dynamo – Fully distributed design - infrastructure • BigTable – Sparse data model
Other NoSql databases NoSql Big Data NoSql • MongoDB • Neo4J • HyperGra • Memcach • Tokyo Ca • Redis • CouchDB • Hypertab • Cassandra • Riak • Voldemort • HBase
Bigtable / Dynamo Bigtable Dynamo • Hbase • Hypertable • Riak • Voldemort Cassandra Combination of Both
CAP Theorem • Consistency • Availability • Partition Tolerance
Applications • Facebook • Google Code • Apache • Digg • Twitter • Rackspace • Others…
What Is Cassandra? • O(1) node lookup • Key – Value Store • Column based data store • Highly Distributed – decentralized (no master\slave) • Elasticity • Durable, Fault-tolerant - Replications • Sparse • ACID NoSQL!
Overview – Data Model • Keyspace • Uppermost namespace • Typically one per application • Column • Basic unit of storage – Name, Value and timestamp • ColumnFamily • Associates records of a similar kind • Record-level Atomicity • Indexed • SuperColumn • Columns whose values are columns • Array of columns • SuperColumnFamily • ColumnFamily whose values are only SuperColumns
Examples • Column - City: ORANJESTAD {"id": 1, "name": "ORANJESTAD", "population": 33000, "capital": true} • SuperColumns – Country: Aruba {"id": "aa", "name": "Aruba", "fullName": "Aruba“, "location": "Caribbean, island in the Caribbean Sea, north of Venezuela", "coordinates": { "latitudeType": "N", "latitude": 12.5, "longitudeType": "W", "longitude": 69.96667}, ….
Replication & Consistency • Consistency Level is based on Replication Factor (N), nor the number of nodes in the system. • The are a few options to set How many replicas must respond to declare success • Query all replicas on every read • Every Column has a value and a timestamp – latest timestamp wins • Read repair – read one replica and check the checksum/timestamp to verify • R(number of nodes to read from) + W(number of nodes to write on) > N (number of nodes)
The Ring - Partitioning • Each NODE has a single, unique TOKEN • Each NODE claims a RANGE of its neighbors in the ring • Partitioning – Map from Key Space to Token – Can be random or Order Preserving • Snitching – Map from Nodes to Physical Location
Writing • No Locks • Append support without read ahead • Atomicity guarantee for a key (in a ColumnFamily) • Always Writable!!! • SSTables – Key/data – SSTable file for each column family • Fast
Reading • Wait for R responses • Wait for N – R responses in the background and perform read repair • Read multiple SSTables • Slower than writes (but still fast)
Compare with MySQL (RDBMS) • Compare a 50GB Database: • MySQL • ~300ms write • ~350ms read • Cassandra • ~0.12ms write • ~15ms read
Queries • Single column • Slice • Set of names / range of names • Simple slice -> columns • Super slice -> supercolumns • Key range
Sorting • Sorting is set on writing • Sorting is set by the type of the Column/Supercolumn keys • Sorting/keys Types • Bytes • UTF8 • Ascii • LexicalUUID • TimeUUID
Drawbacks • No joins (for speed) • Not able to sort at query time • Not really supports sql (altough some API’s support it on a very small portion)
API’s Many API’s for large number of languages includes C++, Java, Python, PHP, Ruby, Erlang, Haskell, C#, Javascript and more… • Thrift interface – Driver level interface – hard to use. • Hector – a java Cassandra client – simple Column based client – does what Cassandra is intended to do. • Kundera – JPA supported java client – tries to translate JPA classes and attributes to Cassandra – good on inserts, hard and problematic still with queries.
Cassandra Installation • Install prerequisite – basically the latest java se release • Extract the Cassandra Zip files to your requested path • Run Bin/cassandra.but –f • Cassandra node is up and running
World database in cassandra • World - Keyspace • Countries – SuperColumn Family • CountryDetails – SuperColumn • Border – SuperColumns • Coordinates – SuperColumn • GDP – SuperColumn • Language – SuperColumns • Cities – Column Family
Using Hector API - definitions • Creating a Cassandra Cluster : • Adding a keyspace: • Adding a Column: Cluster cluster = HFactory.getOrCreateCluster("WorldCluster", "localhost:9160"); columnFamilyDefinition.setKeyspaceName(WORLD_KEYSPACE); BasicColumnFamilyDefinitioncolumnFamilyDefinition = new BasicColumnFamilyDefinition(); columnFamilyDefinition.setKeyspaceName(WORLD_KEYSPACE); columnFamilyDefinition.setName(CITY_CF); // ColumnFamily Name columnFamilyDefinition.addColumnDefinition(columnDefinition);
Using Hector API - definitions • Adding a SuperColumn: • Adding all definition to cluster: BasicColumnFamilyDefinitionsuperCfDefinition = new BasicColumnFamilyDefinition(); superCfDefinition.setKeyspaceName(WORLD_KEYSPACE); superCfDefinition.setName(COUNTRY_SUPER); superCfDefinition.setColumnType(ColumnType.SUPER); ColumnFamilyDefinitioncfDefStandard = new ThriftCfDef(columnFamilyDefinition); ColumnFamilyDefinitioncfDefSuper = new ThriftCfDef(superCfDefinition); KeyspaceDefinitionkeyspaceDefinition= HFactory.createKeyspaceDefinition(WORLD_KEYSPACE, "org.apache.cassandra.locator.SimpleStrategy", 1, Arrays.asList(cfDefStandard, cfDefSuper)); cluster.addKeyspace(keyspaceDefinition);
Using Hector API - inserting • Creating a Column Template • Adding a Row into a Column Family ColumnFamilyTemplate<String, String> template = new ThriftColumnFamilyTemplate<String, String>(keyspaceOperator, columnFamilyName, stringSerializer, stringSerializer); ColumnFamilyUpdater<String, String> updater = template.createUpdater("a key"); updater.setString(“key", "value"); try { template.update(updater); } catch (HectorException e) { // do something ... }
Using Hector API - inserting • Creating a Super Column Template • Adding a Row into a SuperColumnFamily SuperCfTemplate<String,String, String> template = new ThriftSuperCfTemplate<String, String, String>(keyspaceOperator, columnFamilyName, stringSerializer, stringSerializer, stringSerializer); SuperCfUpdater<String, String, String>updater = template.createUpdater("a key"); HSuperColumn<String, String, ByteBuffer> superColumn = updater.addSuperColumn(“sc name”); superColumn.setString(“column name”, value); superColumn.update(); try{ template.update(updater); } catch (HectorException e) { // do something ... }
Using Hector API - reading • Reading all Rows and it’s columns from a Column Family (Using CQL) • Reading all columns from a Row in a SuperColumn Family CqlQuery<String,String,String> cqlQuery = new CqlQuery<String,String,String>(factory.getKeyspaceOperator(), stringSerializer, stringSerializer, stringSerializer); cqlQuery.setQuery("select * from City"); QueryResult<CqlRows<String,String,String>> result = cqlQuery.execute(); SuperCfTemplate<String,String,String> superColumn = HectorFactory.getFactory().getSuperColumnFamilyTemplate(“SuperColumnFamily”); SuperCfResult<String, String, String> superRes = superColumn.querySuperColumns(“key"); Collection<String> columnNames = superRes.getSuperColumns();
Using Hector API - reading • Reading a SuperColumn from a Row in a SuperColumnFamily • Every query as options to get part of the rows – by setting start value and end value (the rows are sorted on inserting), and part of the columns by setting the column names explicitly SuperColumnQuery<String, String, String, String> query = HFactory.createSuperColumnQuery(keyspaceOperator, stringSerializer, stringSerializer, stringSerializer, stringSerializer); query.setColumnFamily(“SuperColumnFamily”); query.setKey(“key"); query.setSuperName(“SuperColumnName"); QueryResult<HSuperColumn<String, String, String>> result = query.execute(); for (HColumn<String, String> col : result.get().getColumns()) { String name = col.getName(); String value = col.getValue(); }
Administration tools • Cassandra – node activator • Nodetool – bootstrapping and monitoring • Cassandra-cli – Application Console • Sstable2json - Export • Json2sstable - Import