180 likes | 480 Views
Google File System Simulator. Pratima Kolan Vinod Ramachandran. Google File System. Master Manages Metadata Data Transfer Happens directly between client and chunk server Files broken into 64 MB chunks Chunks replicated across three machines for safety. Event Based Simulation.
E N D
Google File System Simulator Pratima Kolan Vinod Ramachandran
Google File System • Master Manages Metadata • Data Transfer Happens directly between client and chunk server • Files broken into 64 MB chunks • Chunks replicated across three machines for safety
Event Based Simulation Get Next High Priority Event from Queue Component 1 Simulator Place Event in Priority Queue Priority Queue Component 2 Event 1 Event 2 Event 3 Output of simulated event Component 3
Simplified GFS Architecture Switch: Infinite Bandwidth Client Master Server Switch Represent Network Queues Network Disk 3 Network Disk 4 Network Disk 5 Network Disk 1 Network Disk 2
Data Flow The client queries the master server for a Chunk ID it wants to read. The master server returns a set of disks ids that contain the Chunk. The client requests a disk for the Chunk The disk transfers the data to the client
Experiment Setup • We have a client whose bandwidth can be varied from 0…..1000 Mbps • We have 5 disks each a having a per disk bandwidth of 40 Mbps • We have 3 chunk replicas per chunk of data as a baseline • Each client request is for 1 Chunk of data from a disk
Simplified GFS Architecture Client Bandwidth varied from 0…..1000 Mbps Switch: Infinite Bandwidth Client Master Server Switch Represent Network Queues Network Disk 3 Network Disk 4 Network Disk 5 Network Disk 1 Network Disk 2 Chunk ID: 0-1000 0-1000 0-2000 1001-2000 1001-2000 Per Disk Bandwidth : 40 Mbps
Experiment 1 • Disk Requests Served With out Load Balancing • In this case we pick the first chunk server from the list of available chunk servers that contains the disk block. • Disk Requests Served With Load Balancing • In this case we apply a greedy algorithm and balance the load of incoming requests across the 5 disks
Expectation • In the Non load balancing case we expect the effective request/data rate to reach a peak value of 2 disks(80 Mbps) • In the load balancing case we expect the effective request/data rate to reach a peak value of 5 disks(200 Mbps)
Load Balancing Graph This graph plots the data rate at client vs. client bandwidth
Experiment 2 • Disk Requests Served With No Dynamic Replication • In this case we have a fixed number of replicas(3 in our case) and the server does not create more replication based on statistics for read requests. • Disk Requests Served With Dynamic Replication • In this case the server replicates certain chunks based on the frequency of the chunk requests. • We define a replication factor , which is fraction < 1 • No of Replicas For Chunk = (replication factor) * No of requests For The Chunk • We Cap the Max No of Replicas by the Number of disks
Expectation • Our Requests are all aimed on the chunks placed in disk 0,disk 1 , disk2. • In the non replication case we expect the effective data rate at the client to me limited by the bandwidth provided by 3 disks(120 Mbps) • In the replication case we expect the effective data rate at the client to me limited by the bandwidth provided by 5 disks(200 Mbps)
Replication Graph This graph plots the data rate at client vs. client bandwidth
Experiment 3 • Disk Requests Served with no Rebalancing • In this case we do not implement any rebalancing of read requests based on frequency of chunk requests • Disk Requests Served with Rebalancing • In this case we perform rebalancing of read requests by picking a request with highest frequency and transferring it to a disk with a lesser load
Conclusion and Future Work • GFS is a simple file system for large-data intensive applications • We studied the behavior of certain read workloads on this file system • In the future we would like to come up with optimizations that could fine tune GFS