170 likes | 720 Views
深入 Cassandra. 郭鹏. 主题. 什么是 Cassandra Cassandra 的数据模型 Cassandra 数据写入流程 Cassandra 的数据存储 文件 Cassandra 数据读取流程. 什么是 Cassandra. Bigtable Dynamo. Proven.
E N D
深入Cassandra 郭鹏
主题 • 什么是Cassandra • Cassandra的数据模型 • Cassandra数据写入流程 • Cassandra的数据存储文件 • Cassandra数据读取流程
什么是Cassandra • Bigtable • Dynamo
Proven • Cassandra is in use at Digg, Facebook, Twitter, Reddit, Rackspace, Cloudkick, Cisco, SimpleGeo, Ooyala, OpenX, and more companies that have large, active data sets. The largest production cluster has over 100 TB of data in over 150 machines.
Fault Tolerant • Data is automatically replicated to multiple nodes for fault-tolerance. Replication across multiple data centers is supported. Failed nodes can be replaced with no downtime.
Decentralized • Every node in the cluster is identical. There are no network bottlenecks. There are no single points of failure.
You're in Control • Choose between synchronous or asynchronous replication for each update. Highly available asynchronus operations are optimized with features like Hinted Handoff and Read Repair.
Rich Data Model • Allows efficient use for many applications beyond simple key/value.
Elastic • Read and write throughput both increase linearly as new machines are added, with no downtime or interruption to applications.
Durable • Cassandra is suitable for applications that can't afford to lose data, even when an entire data center goes down.
SSTable文件构成 • Filter文件 • Index文件 • Data文件
Filter • Filter文件用于快速定位某一个Key是否在该SSTable文件中存在 • 布隆过滤器
Index • Index文件中找到这个Key对应的Column值在Data文件中的具体位置
Data • Data文件中才会存储真正的数据,但是Data文件又不仅仅存储了需要查询的数据,另外还存储了某一个Key对应的一些Column的索引信息。
Q&A 谢谢大家