250 likes | 374 Views
Big Data - Storage. Kalapriya Kannan IBM Research Labs July, 2013. What are we going to understand. Why distributed computing Is it something new? HDFS - basics. What are the things we do with data. Store Perform some basic operations (add, substract.. ) typically called Transformations.
E N D
Big Data - Storage Kalapriya Kannan IBM Research Labs July, 2013
What are we going to understand • Why distributed computing • Is it something new? • HDFS - basics
What are the things we do with data • Store • Perform some basic operations (add, substract.. ) typically called Transformations. • Build functions to analyze the data – trends, prediction, forecasting etc.,
Step 1: Parallelize this program? d c a,b c d a0,b0 a1,b1 What is required? Function() { Read a,b from db; Compute c= a+b; Compute d= a*b; Compute c+d; } • Not just parallel program. - not Sufficient • Data has to be parallelized – for parallel access.
What is common to all? • Manage files • Booking keeping • Where it is located? What is the name of it ? What is the block size? • Is there a lock on it? • What kind of access control is there? • Who are the people who can read/write? • Is there a fragmentation within the file. • When was it last modified. File system
Distributed file system? DEFINITIONS: • A Distributed File System ( DFS ) is simply a classical model of a file system ( as discussed before ) distributed across multiple machines. The purpose is to promote sharing of dispersed files. • This is an area of active research interest today. • The resources on a particular machine are local to itself. Resources on other machines are remote. • A file system provides a service for clients. The server interface is the normal set of file operations: create, read, etc. on files. Classic Example: NFS
The popular Hadoop • HDFS • Why is it so popular • Simplicity • Easy to add new machines • Cheap storage (no new investments).
HDFS is good for…………. • Precisely does what we require. • Parallelize the data so that accesses to it can be parallelized. • Appears as single disk • Runs on top of a native file system eg.,Ext3, ext4, XFS….. • Based on Google File System (GFS) • Fault tolerant • Can handle disk crashes, machine crashes etc., • “Cheap” commodity server Hardware • Cost of a 3 terabyte – file server costs nearly Rs. 15,00,000/- • Cheapest commodity machine available Rs. 20,000/- • Storing large files • Terabytes, petabytes etc., Million rather than billion of files • Streaming data • Write once and read multiple times • Optimized for streaming reads rather than random reads.
HDFS is not good for… • Low latency reads • High throughput rather than low latency for small chunks of data • Hbase addresses this issue • Better for millions of large files rather than billions of small files • Multiple writers • Single writer per file • Writes only at the end of the file, no support for arbitrary offset.
HDFS world • Three terms - name node, Data node and secondary node • Name node • Book keeping -1 to limited number of nodes • Data node – all nodes • Secondary node – 1 node as Name node
HDFS Organization • File system cluster is managed by three types of processes • Namenode • Manages the file system’s namespace/meta-data/file blocks • Run on 1 machine to several machines • Data node • Stores and retrieves data blocks • Reports to Name node • Runs on many machines • Secondary Name node • Performs house keeping work • Requires similar hardware as Namenode machine • Not used for high-availability –not a backup for namenode.
HDFS Features • Files are split into Blocks (single unit of Storage) • Managed by Namenode, stored by DataNode • Transparent to the user • Replicated across machines at load time • Same block is stored on multiple machines • Good for Fault-Tolerant and access • Default replication is 3.
HDFS Blocks • Block sizes are traditionally either 64 MB or 128MB • Default is 64 MB • The motivation is to minimize the cost of seeks as compared to transfer rate • ‘Time to Transfer’ > ‘Time to Seek’ • For example, lets say • Seek time = 10ms • Transfer rate = 100MB/s • To achieve seek time of 1% transfer rate • Block size will need to be = 100 MB.
Block Replication • Namenode determines replica placement • Replica placement are rack aware • Balance between reliability and performance • Attempts to reduce bandwidth • Attempts to improve reliability by putting replicas on multiple racks. • Default replication is 3 • 1st replica on the local disk • 2nd replica on the local rack but different machine • 3rd replica on the different rack • This policy may change/improve in the future.
Client, Name node and Data nodes • Name node do NOT directly write or read data • One of the reasons for HDFS Scalability • Client interacts with Namenode to update Name node’s HDFS namespace and retrieve block locations for writing and reading • Clients interact with Datanode to read/write data
Recap • Two kinds of programs • Computing should be more • Data access should be parallelized
IO Bottleneck Machine On the machine n/w bottleneck + io bottleneck Distributed through network Some traditional paradigms Performance Perhaps Good Performance
3 2 Split files 5 1 4 Move to machines 5 5 4 4 3 3 1 1 2 2 code code code code code code code code code code 3 5 Replicate to machines 4 2 1 How HDFS works code file
Name node considerations… • For fast access Namenode keeps all the metadata in-memory • The bigger the cluster – the more RAM is required. • Best for millions of large files than billions • Will work well for clusters of 100s machines • Changing block size will affect how much space a cluster can host • 64MB to 128MB will reduce the number of blocks and significantly increase how much space the Namenode will be able to support. • Example: • Lets say we are storing 200 Terabytes = 209,715,200 MB • With 64 MB block size that equates to 3,276,800 blocks • With 128 MB block size that equates to 1,638,400 blocks
Name node’s fault- tolerance • Name node daemon process must be running at all times • If process crashes then cluster is down. • Name node is a single point of failure • Host on a machine with reliable hardware (ex. Sustain a disk-failure). • Usually not an issue.