1 / 50

Marcelo Olivas @mfolivas

Marcelo Olivas @mfolivas. Start With “ WHY ”. Scale Out...seriously tho. Linear scaling Real multi-datacenter support “ Fix it Monday ” fault tolerance. Data Model. key/value pair. C. MARCELO. NAME. 1298000486821. timestamp. Column. ROW KEY. C. C. C. C. MOBILE. FNAME. LNAME.

bruno
Download Presentation

Marcelo Olivas @mfolivas

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Marcelo Olivas • @mfolivas

  2. Start With “WHY”

  3. Scale Out...seriously tho • Linear scaling • Real multi-datacenter support • “Fix it Monday” fault tolerance

  4. Data Model

  5. key/value pair C MARCELO NAME 1298000486821 timestamp Column

  6. ROW KEY C C C C ... ... ... ... MOBILE FNAME LNAME EMAIL 1298000486821 1298000486833 1298000486835 1298000486839 COLUMNS Row

  7. Row • A row is a set of orderable columns

  8. C C C C C C C C C C C C C COLUMN COLUMN COLUMN COLUMN COLUMN COLUMN COLUMN COLUMN COLUMN KEY KEY KEY KEY Column Family

  9. GOOG Price: 589.55 Name: Google APPL Price: 401.76 Name: Apple NFLX Price: 78.73 Name: Netflix Exchange: NYSE Static Column Family

  10. Dynamic Column GOOG 10/25/11=583.16 10/24/11=596.42 10/23/11=590.49 APPL 10/25/11=397.77 10/24/11=405.77 10/23/11=392.87 NFLX 10/25/11=77.37 10/24/11=118.14 10/23/11=117.23 Prematerialized Queries Store it how you read it

  11. Column Family • A column family is a set of orderable rows

  12. Keyspace COLUMN FAMILY COLUMN FAMILY COLUMN FAMILY

  13. Cluster KEYSPACE KEYSPACE

  14. Data Modeling • Column Families are not tables • Column Families can be wide • Column Families can be narrow • The database does not join • You only get one index*

  15. Secondary Index • Index any column value • Performant for low-cardinality columns • Opposite of most relational indexes • Well-supported

  16. Patterns • Entity • Column Family as Index • Materialized View • Time Series • Event Sourcing

  17. Entities • Single Primary Key • Fairly Consistent Columns Names • Fairly Narrow Rows • Feels Like Relational Database • Schemaless

  18. Column Family as Index • Rows are Indexed by Row Key • Secondary Indexes Want Low Cardinality • We Have to Build Our Own • Query-First Data Modeling • Row Key Becomes Index Value • Column Values are Entity Row Keys

  19. Materialized View • Based on Entity Data • A Separate Column Family • Entity Data Organized by Index Key

  20. Time Series • Row Key is Time Identifier • Column Values are Events • Columns Values are Measurements • Rows Can be Very Wide

  21. Event Sourcing • A Martin Fowler idea • Persist state change events • Do not persist present state • All writes are immutable • Play back the tape to rehydrate the app

  22. CQL Missing old friends

  23. CQL over JDBC Class.forName("org.apache.cassandra.cql.jdbc.CassandraDriver"); Connection con = DriverManager.getConnection("jdbc:cassandra://localhost:9170/Keyspace1"); String query = "UPDATE Test SET a=?, b=? WHERE KEY=?"; PreparedStatement statement = con.prepareStatement(query); statement.setLong(1, 100); statement.setLong(2, 1000); statement.setString(3, "key0"); statement.executeUpdate(); statement.close();

  24. Create Keyspace CREATE KEYSPACE Payables; CREATE KEYSPACE MasterClass WITH strategy_class = 'SimpleStrategy' AND strategy_options:replication_factor = 3;

  25. Create Column Family CREATE COLUMNFAMILY Contacts (KEY uuid PRIMARY KEY); CREATE COLUMNFAMILY Monkey (KEY text PRIMARY KEY, id_tag long, emotion text) WITH comment='Simian Emotional States' AND read_repair_chance = 0.5;

  26. Write UPDATE Contacts SET name='Tim Berglund' WHERE KEY = B70DE1D0-9908-4AE3-BE34-5573E5B09F14; UPDATE Monkeys USING CONSISTENCY EACH_QUORUM SET emotion = 'Angry', name = 'Baby Boss' WHERE KEY = 'OREILLY:8827';

  27. READ SELECT * FROM Contacts WHERE KEY = B70DE1D0-9908-4AE3-BE34-5573E5B09F14; SELECT FIRST 1000 FROM Temperatures WHERE KEY = 7627748986; SELECT emotion, id_tag FROM Monkeys USING CONSISTENCY ONE WHERE KEY = 'OREILLY:8827';

  28. Scaling • How a dynamo-style distributed hash table works

  29. Hash Ring

  30. 2000 0000 E000 C000 4000 A000 6000 8000

  31. 2000 0000 E000 C000 4000 A000 6000 8000 3D97: “ANGRY” 9C4F: “MONKEY”

  32. 2000 0000 E000 C000 4000 A000 6000 8000 3D97? 3D97: “ANGRY”

  33. 2000 0000 E000 C000 4000 A000 6000 8000 9C4F? 9C4F: “MONKEY”

  34. Replication

  35. Replication • Replication factor (N) • default is 1 but 3 is conventional • Replica placement strategy - there more than these • Simple • Network topology strategy

  36. Simple Strategy 2000 0000 E000 C000 4000 A000 6000 8000 3D97: “ANGRY” N=3 3D97: “ANGRY” 3D97: “ANGRY”

  37. Network Topology DC1 DC2 4000 2000 6000 8000 C000 E000 A000 0000 9C4F: “MONKEY” 9C4F: “MONKEY” 9C4F: “MONKEY” 9C4F: “MONKEY”

  38. Client Connection 2000 0000 E000 C000 4000 A000 6000 8000 ? CLIENT

  39. Client Connection 2000 0000 E000 C000 4000 A000 6000 8000 3D97? CLIENT

  40. Write Replica 2000 0000 E000 C000 4000 A000 6000 8000 14C7? CLIENT

  41. Hinted “hand off” --- 0000 E000 C000 4000 A000 6000 8000 X CLIENT 14C7? Coordinator stores a “hint”

  42. Write Consistency • ANY • At least one node (hinted handoffs allowed) • ONE • At least one node (no hints allowed) • QUORUM • Written to (N+1)/2 replicas • ALL

  43. Reading Data 2000 0000 E000 C000 4000 A000 6000 8000 (N=3) 9C4F? 9C4F:”MONKEY” TODAY 9C4F:”MONKEY” TODAY 9C4F:”MONKEY” TODAY

  44. Reading Data 2000 0000 inconsistent E000 C000 4000 A000 6000 8000 (N=3) 9C4F? 9C4F:”MONKEY” TODAY 9C4F:”LEMUR” YESTERDAY 9C4F:”MONKEY” TODAY

  45. Read Consistency • ONE • Get one response from the closest replica • QUORUM • Get (n+1)/2 responses, then return the most recent timestamp (at least two) • ALL • Wait until N replicas respond. Fail if they don’t all answer

  46. Storage

  47. Writes Commit Log -> Memtable -> SSTable

  48. Partitioning

  49. Reads

  50. Reads

More Related