350 likes | 520 Views
SANtopia Design Features. 컴퓨터 . 소프트웨어연구소. 배경. 인터넷의 확산으로 인한 데이터의 폭발적 증가 대용량 저장장치의 요구사항 증가 확장 (Scalability) 가능한 저장매체 필요. Source: IDC. (PetaBytes). (Years). 저장매체 용량의 수요예측. Client. Client. Internet. Web Server. Application Server. DB Server. 배경. 기존 서버 중심 환경의 문제점
E N D
SANtopia Design Features 컴퓨터.소프트웨어연구소
배경 • 인터넷의 확산으로 인한 데이터의 폭발적 증가 • 대용량 저장장치의 요구사항 증가 • 확장(Scalability) 가능한 저장매체 필요 Source: IDC (PetaBytes) (Years) 저장매체 용량의 수요예측 자료저장시스템워크샵
Client Client Internet Web Server Application Server DB Server 배경 • 기존 서버 중심 환경의 문제점 • 성능상의 문제 발생(Performance Bottleneck) • 확장성의 한계(저장장치, 컴퓨팅 파워) 기존에 사용되던 서버 중심 저장장치 환경 자료저장시스템워크샵
배경 • SAN 기반 저장장치 • 수 많은 저장장치를 고속의 전용 네트워크(Fiber Channel)에 연결하여 대용량의 공유 저장매체를 제공하는 새로운 개념의 저장장치 Client Client Internet Web Server DB Server Appl. Server Storage Area Network FC Switch FC Switch Tape Driver Disk RAID Disk RAID RAID 자료저장시스템워크샵
배경 • SAN의 수요 확대 • 년 평균 증가율: 85% • 스토리지 수요 증가율(87%)과 비슷한 증가율 (Millions of $) Source: IDC (Years) SAN(Storage Area Network) 시장 예측 자료저장시스템워크샵
배경 SAN은 대용량 저장장치를 지원하기 위한 새로운 개념의 저장장치 H/W 기술 SAN 하드웨어 기술 - 데이터 공유 - 성능 병목 해결 - 사고 발생시 복구 - 통합 관리 • - 대용량 저장 매체 지원 • 저장장치 확장성 지원 추가 요구사항 SAN의 가치를 더욱 높이기 위해서는 SAN Virtualization을 지원하는 시스템 소프트웨어가 제공되어야 함 대용량 공유 파일 시스템의 지원 H/W 독립적인 논리적 저장장치 지원 중앙집중식 시스템 매니지먼트 지원 자료저장시스템워크샵
배경 • SAN Virtualization 시장 예측 • SAN H/W 증가율보다 높은 증가율을 나타냄(100% 이상) • SAN Virtualization 시장규모는 SAN H/W의 10% 수준 (Millions of $) Source: IDC (Years) SAN Virtualization 시장 예측 자료저장시스템워크샵
SANtopia 란 ? • S/W to provide SAN Virtualization • High • Availability • Fast recovery • Online backup • Snapshot • High • Performance • Fast Accessible • Directory Structure • Load Balancing • Global Buffer • Sharing SANtopia Shared File System SAN Infrastructure Logical Volume Driver System Management • High Scalability • Dynamic Inode - No preallocated inode table • Dynamic Reconfiguration - Online Resizing 자료저장시스템워크샵
Features of SANtopia • 64-bit File and File System • Global File Sharing • Provide Global buffer • Open SAN File System • Storages Cluster File System • Centralized Lock Manager with Load Balancing • Not use device lock • Integration of Buffer Manager and Lock Manager • Software RAID(0, 1, 0+1, 5, Concatenate) • Comprised of three parts • Logical Volume Manager • Global Shared File System • Lock and Buffer Manager 자료저장시스템워크샵
VNODE Interface System Call Interface System Management File Manager • Recovery Management • BitMap Management • Transaction Management • File Operation Management • Performance Monitor • Online Backup • Scalability Management • Inode Management • Log Management Global Lock & Buffer Manager Logical Volume Manager • I/O Management • Mapping Management • Mapping Management • Configuration Management SCSI over SAN IP over SAN IP over SAN Disk Disk Disk Disk SANtopia 구조 자료저장시스템워크샵
SANtopia Logical Volume Manager 자료저장시스템워크샵
Features of LVM • Volume Create/Remove • On-line Volume Resize • Dynamic Reconfiguration • Software RAID(0, 1, 0+1, 5, Concatenation) disk1 disk2 disk3 disk4 Volume 1 : Striping (RAID 0) Volume 2 : Concatenation Volume 3 : Striped parity (RAID 5) Volume 4 : Striped Mirroring (RAID 0+1) disk5 disk6 disk7 disk8 자료저장시스템워크샵
A Disk Layout label Private partition (physical partition) • Logical Partition Information • Disk Identifier • Information about Logical Volume Logical partition Public partition (physical partition) Logical partition Logical partition Allocation Bitmap Mapping Info. 자료저장시스템워크샵
Volume Resize • Extend/Shrink Unit = Logical Partition • When a Volume is Striped • Add Column • Data Relocation Needed • Add Row 자료저장시스템워크샵
Logical partition Logical partition physical allocation bitmap Free Space Manager • Physical Allocation Bitmap • Divide into fixed size units • Each unit controlled by separate locks • Entire bitmap is duplicated • Effects • Increase Parallelism • Get scalability • Avoid bottleneck • Reduce metadata search time 자료저장시스템워크샵
Logical partition Logical partition Logical partition Logical partition Host A Host B Host C Host D Mapping Manager • Virtualization of Physical Storage • provide flexibility • enable data movement between Logical Partitions • enable snapshot • Each Mapping Information • Covered by one host • Chained declustered for safety • Same effects as Free Space Manager • Flexible to fail-over 자료저장시스템워크샵
I/O Manager • Load Balancing of I/O • Read Policy • Round-Robin Policy • In case of same Capability • Preferred-Plex Policy • In case of different Capability 자료저장시스템워크샵
SANtopia File Manager 자료저장시스템워크샵
Features of SANtopia File Mgr • Extent Based 64-bit File System • 64-bit Address • Support Large File • Dynamic inode allocation • Multi-Level inode • Support Large Directory • Extensible Hash based directory management • Fast Recovery • Metadata Journaling • Inode Stuffing 자료저장시스템워크샵
SANtopia File System Layout • Extent based allocation • Super Block : SANtopia file system information • Allocation Block • No preallocated area for inode, directory entry, data block • Extent based allocation (4KB ~ 64KB) • Extent bitmap • Located end of address space(file system size) • Need to distinguish from object type in Extent Allocation Bitmap • Use 2 bit : 00 – not used, 01 – inode 10 – dir entry, 11 – data block 0 264-1 Boot Super Allocation Blocks Extent Block Block (inode, directory, data block) Bitmap 자료저장시스템워크샵
inode number (inode information) file or directory information Data Block Pointer or Stuffed Data inode • Dynamic allocation inode • No limitation of inode number • No preallocated inode area • Cf) ext2 file system : 1 node per 4KB • Each inode size is 1 extent • Fragmentation • Stuffed inode for space efficiency • 64-bit inode number • Using unique ID in SANtopia Extent 자료저장시스템워크샵
Dinode Info. : Extent …… Double Indirect blocks … Single Indirect blocks inode structure • Dynamic Multi-Level Inode Allocation 자료저장시스템워크샵
Directory Node (Extent) 4 0000 0001 0010 0011 0100 0101 0110 0111 2 root hash Dir Info. 00 01 10 11 1000 1001 1010 1011 1100 1101 1110 1111 Indirect hash Directory(Extendible Hash) 자료저장시스템워크샵
Recovery • Journaling 기법 사용 • Write in-core log buffer to log-disk when metadata updates. • Log disk is circular buffer • Metadata modification operations(transaction) • create, remove, unlink, link, allocation, truncate, rename, … File Operation (transaction) System Manager (system recovery) Recovery Manager Transaction Manager Metadata Manager Metadata Log Log Manager 자료저장시스템워크샵
SANtopia Buffer and Lock Manager 자료저장시스템워크샵
Features of Buffer Manager(I) • Support Global File Sharing • Reduce disk I/O • Sharing each buffer • Split distributed BM • GBM are distributed(partitioned) on several nodes • Manage Global Buffer List and Local Buffer List • Communication vs. Space overhead • Manage the logical global buffer • Weak correctness of global buffer list • Safe but not up-to-date 자료저장시스템워크샵
Features of Buffer Mgr(II) • Integration of buffer and lock message • Overlapped with global lock manager • Piggyback the buffer lists over lock messages • Reduce the number messages • Adopt write invalidation scheme • For the sake of simplicity • Support buffer forwarding scheme • Enlarging the performance by reducing the disk I/O 자료저장시스템워크샵
Structure of Buffer Manager • Local and Global Buffer Manager • Decision of GBM : Inode hash 자료저장시스템워크샵
Buffer list information GBM Server Failure Operations between GBM and LBM 자료저장시스템워크샵
Features of Lock Manager • Lock Mode • Shred lock and Exclusive lock • Lock Object • 64bits inode - File Lock • Distributed(partitioned) on several nodes • Host-based locking • Overlapped with global buffer manager • Global Lock Manager(GLM) vs. Local Lock Manager(LLM) • Delayed Lock Free • Callback scheme for lock free • Callback by lock server • No lock entrance after receiving a callback message • Recovery on host failure • I/O Fencing • Rebuild lock table: take locks from the failed host 자료저장시스템워크샵
Integration of Lock Mgr and Buffer Mgr 자료저장시스템워크샵
invalidate at unlock local lock manager local buffer manager call_back( lock_id, hrgbl) buffer forwarding local lock 1 … buffer buffer buffer local lock 2 … buffer buffer lock(lock_id,mode,local_buffer_list), unlock(lock_id,local_buffer_list) …… buffer global lock manager global buffer manager global lock 1 host 1 … buffer buffer buffer host 2 … buffer lock_grant( lock_id, mode, host_related_global_buffer_list) buffer …… buffer Operational Design(I) 자료저장시스템워크샵
Operational Design (II) • Global Lock Manager • Upon receiving lock request • Update global buffer list using the local_buffer_list • Upon receiving unlock request • Grant lock before processing the unlock request • Update global buffer list using the local_buffer_list • Upon granting lock • Piggyback a part of global buffer list concerned with the host • Upon sending callback • Piggyback a part of global buffer list concerned with the host 자료저장시스템워크샵
Operational Design (III) • Local Lock Manager • Upon sending lock request • Piggyback the local buffer list • Upon sending unlock request • Invalidate buffer related with the lock • Piggy back the local buffer list • Upon receiving lock grant • Save the piggybacked global buffer list • Upon receiving callback • Prohibit the lock counter from being increased • Unlock as soon as possible 자료저장시스템워크샵
Operational Design (IV) • Local Buffer Manager • Upon receiving forward request • Send the requested buffer without validity check • Of course, check whether the requested block is still cached • If the buffer is already flushed, send an acknowledge signal 자료저장시스템워크샵