140 likes | 315 Views
The Storage. B. Ramamurthy. Topics for discussion. On chip memory On board memory System memory Off system/online storage/ secondary memory File system abstraction Offline/ tertiary memory RAID: Redundant Array of Inexpensive Disks NAS: Network Accessible Storage
E N D
The Storage B. Ramamurthy C B. Ramamurthy
Topics for discussion • On chip memory • On board memory • System memory • Off system/online storage/ secondary memory • File system abstraction • Offline/ tertiary memory • RAID: Redundant Array of Inexpensive Disks • NAS: Network Accessible Storage • SAN: Storage area networks • DB and DBMS: Data base and DB management systems • Distributed file system • Google file system • Hadoop file system C B. Ramamurthy
Data and Computation Continuum Compute intensive Ex: computation of digits of PI Data intensive Ex: analyzing web logs C B. Ramamurthy
More dimensions Other variables: Communication Bandwidth, ? PFLOPS Massively Multiplayer Online game (MMOG) Realtime Systems TFLOPS Digital Signal Processing Compute scale Business Analytics GFLOPS Weblog Mining MFLOPS Payroll K M G T P Data scale C B. Ramamurthy
Solution Processing Granularity Data size: small Pipelined Instruction level Concurrent Thread level Service Object level Indexed File level Mega Block level Virtual System Level Data size: large
On chip memory • Registers • Cache • Buffers (instruction pipeline) • Characteristics: volatile C B. Ramamurthy
On board memory • Cache • Instructions cache • Data cache • Translation look aside buffers (TLB) • Characteristics: content addressable, set-associative organization C B. Ramamurthy
System memory C B. Ramamurthy
Off-system storage (Earlier Lectures covered these) C B. Ramamurthy
Database and Database Management System • Data source • Transactional • Data base server • Relational db or similar foundation • Tables, rows, result set, SQL • ODBC: open data base connectivity • Very successful business model: Oracle, DB2, MySQL, and others • Persistence models: EJB, DAO, ADO (I am not going to expand the abbreviation.. ) C B. Ramamurthy
Distributed file system(DFS) • A dedicated server manages the files for an compute environment • For example, nickelback,cse.buffalo.edu is your file server and that is why we did not want you to run your user applications on this machine. • DFS addresses various transparencies: location transparency, sharing, performance etc. • Examples: NFS, NFS+, AFS (Andrew FS)… (you will study these in Distributed Systems course) C B. Ramamurthy
On to Google File • Internet introduced a new challenge in the form web logs, web crawler’s data: large scale “peta scale” • But observe that this type of data has an uniquely different characteristic than your transactional or the “order” data on amazon.com: “write once” ; so is HIPPA protected healthcare and patient information; • Google exploited this characteristics in its Google file system: S. Ghemavat C B. Ramamurthy
Hadoop File System (HFS) • Hadoop file system is a reverse engineered version of the GFS : this is my first opinion on HFS • HFS is a distributed file system for large scale data • Data throughput is more important than latency • Batch computing than interactive time shared computing C B. Ramamurthy
MapReduce Cat Bat Dog Other Words (size: TByte) reduce combine map part0 split reduce combine map part1 split reduce combine map split part2 map split