390 likes | 579 Views
Exploring Collaboration Between Blockchain and Distributed Databases. Bo Wang Nov. 27, 2018. Blockchain. Chained list of blocks Block: a hash pointer to previous block, timestamp, transaction data Store the head of the list: a hash‐pointer that points to the latest block. Blockchain.
E N D
Exploring Collaboration Between Blockchain and Distributed Databases Bo Wang Nov. 27, 2018
Blockchain • Chained list of blocks • Block: a hash pointer to previous block, timestamp, transaction data • Store the head of the list: a hash‐pointer that points to the latest block
Blockchain • Managed by a peer-to-peer network • Consensus algorithm: Proof-of-Work (PoW): Nodes are competing to find a nonce to solve a puzzle. H (nonce || prev_hash || tx || tx || ... || tx) < target
Features of Blockchain • Special kind of distributed DB: provide data storage • Decentralization: no central authority • Immutability / Tamper-resistance
Drawbacks of Blockchain • Low throughput Bitcoin: 1 tps (transaction per second) average, maximum 7 tps Visa: 2, 000 tps typical, 10, 000 tps peak • Long latency Bitcoin: 10 minutes / block, 1 hour / transaction Financial Applications: 30 to 100 ms • Low capacity Bitcoin: 180GB in 2018 Big Data: petabytes (1,000,000GB) • No query
Features of Distributed Databases • High throughput Cassandra: 50 nodes, 174, 000 writes/second, 2011 A few dozen nodes, 1 million writes/second, 2014 • Large capacity Each node stores a subset of data Linear increase in capacity with the number of nodes • Short latency Cassandra: 10 ms read/write, tested by UoT, 2012 Latency does not worsen as the number of nodes increases. • Rich query
Drawbacks of Distributed Databases • Centralized: depend on trusted third-parties • Vulnerable to cyber attacks • Tampering with data can go undetected. e.g. data alteration and deletion • Data integrity can be hardly restored once lost.
Collaboration: Blockchain-based DB • Blockchain Low performance, High security • Distributed databases High performance, Low security • Blockchain-based DBs High performance, High security
Technology Choices in Blockchain • Consensus algorithm: PoW • It takes a node 10 minutes to find a nonce to validate a block. • Increasing computing power won’t improve performance. • Full replication • Each node stores a copy of all the data. • This copy is typically kept on a single hard drive.
Technology Choices in Distributed DB • Partial replication • Each node keeps some of the data. • Each bit of data is replicated on several nodes. • Paxos consensus algorithm • Fault-tolerant, reach consensus with unresponsive nodes. • Lineage, well handle high throughput, low latency, high capacity, efficient network utilization, any shape of data…
Principles of Integration • Increase performance • Keep more distributed DB features • Increase security • Add more blockchain features
Blockchain-based DB: BigchainDB • Built on top of a distributed DB, e.g., RethinkDB • Inherits high performance from distributed DB. • Nodes can be added to increase throughput and capacity. • Add blockchain features • Decentralized control • Immutability • The ability to create & transfer assets
BigchainDB Architecture • Presents API to clients as a single DB. • Each node has two distributed DB: S and C. • Each DB runs its own internal consensus algorithm, e.g., Paxos • S and C are connected by BigchainDB consensus algorithm.
BigchainDB Architecture • S: unordered set of txns • Validate new txn • Assign to other nodes • Node K: signing node • Sk: set of txns assigned to node K • Create a block of Sk • Put the block into C • C: ordered list of blocks
BigchainDB Architecture • Voting Mechanism • Each signing node votes whether a block is valid or invalid. • Check validity of every transaction in the block. • Quorum is a majority of votes.
Behavioral Description • Left: The backlog S starts empty and the chain C starts with only a genesis block. • Right: Clients have inserted transactions into backlog S and assigned to nodes 1, 3, and 2.
Behavioral Description • Left: Node 1 has moved its assigned transactions from backlog S to chain C. • Right: Node 3 has processed its assigned transactions too.
Behavioral Description • Transactions from an invalid block (on right, shaded) get re-inserted into backlog S for re-consideration.
Behavioral Description: Multiple Machines • More than one client may talk to a given node.
Behavioral Description: Multiple Machines • There are multiple nodes. • Each node has a view into S and C. • Typically a client connects to just one node.
BigchainDB Consensus Algorithm (BCA) • BCA is a state machine running on each signing node. • mainLoop()
Blockchain of BigchainDB • Each block is written before a quorum of nodes votes on it. • Chainification happens at voting time. • Every block has an id equal to the hash of its transactions, timestamp, voters list and public key of its creator-node. • A block does not include the hash (id) of the previous block when it first gets written. • Votes get appended to the block over time, and each vote has a “previous block” attribute equal to the hash of the previous block.
Experimental Results • Throughput increased proportionately to the number of nodes. • The write throughput was over 1 million per second when we have 32 nodes.
Experimental Results • Linear scaling in write performance with the number of nodes.
Summary of BigchainDB • Main purpose: increase performance, points to 1 million writes per second, sub-second latency, and petabyte capacity. • Use a lightweight consensus algorithm: voting instead of PoW to validate blocks. • Sacrifice some security guarantees. • In case of a majority of malicious nodes, it can no longer ensure data integrity. • Data redundancy: each node runs two DBs. • Inherit limitations of distributed DB.
Blockchain-based DB to Ensure Data Integrity • Modern database systems use logging mechanisms to track data changes, e.g., Redo Log of Oracle. • If logging files are forged, recognizing an attack or a failure is awkward. • Typically, Remote Data Auditing mitigations are employed, but they come with high costs and rely on trusted third-parties. • Blockchain’s tamper-resistant feature can provide strong data integrity guarantees in trust-less networks.
2-Layer Blockchain Architecture (2LBC) • First layer: a permissioned blockchain • Uses a lightweight consensus protocol that assures low latency and high throughput. • Aims at quickly and reliably storing evidences of every operations. • Provides weak data integrity guarantees. • Second layer: a public permissionless blockchain • A PoW-based blockchain that stores evidences of the database operations logged by the first-layer.
2-Layer Blockchain Architecture (2LBC) • “Mining Rotation” consensus algorithm • Divides time into rounds, each round, elects a miner as a leader. • The leader receives new operations, sign them with private key, and broadcast them to the other miners. • Blockchain Anchoring technique • Interaction between the first and second layer. • Periodically, the hash of the first layer blockchain is sent to the second layer blockchain via the Anchoring Manager.
Strengths of 2LBC Architecture • Second layer PoW-based blockchain ensures data integrity. • Chang evidences of operation in the first layer: compromise all the replicas. • If the hash of the evidences has been stored in the second layer, the attacker effort is close to infinite. • First layer lightweight consensus ensures performance. • Second layer blockchain is running in the background. • From a client’s point of view, an operation on the database is completed as soon as it is elaborated by the first-layer blockchain.
Weaknesses of 2LBC Architecture • Availability • The first layer blockchain is designed upon a total consensus mechanism. • Availability can be critically affected by violating only a single miner. • Scalability • The overall system performance does not scale adding new nodes. • The used total consensus algorithm has lower performance with additional nodes.
Compare BigchainDB and 2LBC Architecture • All use lightweight consensus algorithms instead of PoW to ensure high throughput and low latency. • BigchainDB: Voting, quorum is majority. • 2LBC: Miner Rotation, all sign to validate a block. • BigchainDB stores all transactions in blockchain. • 2LBC only stores database operations in blockchain. • BigchainDB: good scalability, partial replication in each DB. • 2LBC: bad scalability, full replication in blockchain DB. • 2LBC has an extra layer of PoW blockchain to ensure security.
Alternative Collaborations • Divide a whole business system into data-intensive modules and non-data-intensive modules. • Data-intensive modules can be built on traditional databases. • Non-data intensive modules can be built on blockchain. • Divide a whole business system into trust-related part and non-trust-related part. • Trust-related part should be simplified in data volume, e.g., using hash values, to accommodate to blockchain. • Non-trust-related part can be built on traditional databases.
Conclusions • Blockchain’s decentralization and immutability features ensure data integrity, but its PoW consensus algorithm and full replication affect performance. • Distributed DBs have high performance but rely on trusted third-parties. Data integrity can be hardly restored once lost. • Blockchain-based DBs combine features of blockchain and distributed DBs to achieve high performance and high security.