400 likes | 430 Views
VERITAS Volume Manager Technical Talk. S201/Best Practices Mike Root Sr. Volume Manager Engineer Sean Derrington Sr. Group Manager, Product Management VERITAS Software. VxVM. Host based storage virtualization Robust Flexible High performing Solid foundation for advanced functionality
E N D
VERITAS VolumeManager TechnicalTalk S201/Best Practices Mike Root Sr. Volume Manager Engineer Sean Derrington Sr. Group Manager, Product Management VERITAS Software
VxVM • Host based storage virtualization • Robust • Flexible • High performing • Solid foundation for advanced functionality • Snapshots • Replication • Clusterable How does VxVM do it?
VxVM Internals Overview • VxVM objects • What they are • How they affect IO • Private region • Structure • Usage • New in Version 4.0 • Configuration backup and restore • Logging • FlashSnap • Volume sets
VxVM Objects Replication Volume Group Replication Link Volume Set Volume SRL DCO Plex Logonly Plex Snapshot Subcache Subdisk Logsubdisk DCO Volume Disk Media Cache Disk Access Info Cache Volume Dynamic Multipathing Storage Pool
IO Path Basics • Users direct IO requests at volume or volume set • vxconfigd creates devices in /etc/vx/[r]dsk/ • Device number identifies initial target object for each user request • VxVM IO is object based • vxconfigd loads the objects into the kernel • Volume is found in a hash from device number • Each object performs or requests actions required at its level
Object IO Path Example Application write my_vol my_vol[log] my_vol-01 my_vol-02 my_vol-03 my_vol-04 d1-01 d2-01 d3-01 d4-01 B A d5 d6 d1 d2 d3 d4 Dynamic Multipathing
Object Level IO Performance • vxstat displays per-object IO statistics # vxstat vol vol-01 vol-02 c2t0d0s2-01 OPERATIONS BLOCKS AVG TIME(ms) TYP NAME READ WRITE READ WRITE READ WRITE vol my_vol 373 500 3977 1000 11.1 4.4 pl my_vol-01 244 500 905 1000 10.5 3.8 pl my_vol-02 129 500 3072 1000 13.3 4.0 sd c2t0d0s2-01 129 500 3072 1000 13.3 4.0 • Identify “hotspots” at the object level • Resolve by relocating subdisks from busy disks
Cluster Volume Manager IO Path my_vol • Normal IO • Each node’s VxVM instance makes IO requests directly to disks • No inter-node messages required • Certain operations require messages between nodes • Administrative IO (eg. volume relayout, …) • Error recovery • FlashSnap bitmap updates • VVR writes to the SRL
Objects and IO Error Recovery • Issue • Plexes must be kept consistent when IO errors occur • vxio • Marks disk with FAILING flag (Prevents allocation of new objects) • Reads sent to a different plex • Changes plex kstate to DETACH and updates klog • vxconfigd • Detaches plex containing failing device (A detached plex remains associated with its volume, but does not participate in IO)
Volume Recovery After a System Crash • Issue: • VxVM ensures the volumes are not left in indeterminate states (e.g., inconsistent mirrors) • Goal: Start volumes ASAP without loss of consistency • Mechanism: read-writeback mode • Every block read by a user is rewritten to all plexes • Make all plex contents consistent in background • Eliminates the possibility of reading different data from different plexes during recovery • Used: • When starting a mirrored volume • When a CVM node fails • With DRL, only “in flight” regions need be made consistent
VxVM Internals Overview • VxVM objects • What they are • How they affect IO • Private region • Structure • Usage • New in Version 4.0 • Configuration backup and restore • Logging • FlashSnap • Volume sets
Private Region Overview • Initialize a disk before VxVM can use it • Identifies the disk to VxVM with a “unique” diskid • A disk initialized for VxVM use has • Private region: ………containing VxVM metadata • Public region:………..application data storage • Objects information stored in the private region • Configuration information saved on selected disk • Number of disks based on total disks in diskgroup • Enabled regions spread across controllers and enclosures • Fewer disks means faster configuration changes
VxVM Disk Types/Formats Newin 4.x Starting with 4.0, always use cdsdisk format “auto:cdsdisk”
Newin 4.x Cross-Platform Data Sharing (CDS) Disk Format • Allows diskgroups to be moved between different platforms • Allows disks to be recognized by all supported UNIX platforms • Private regions aligned on 8KB boundaries • Can be imported on any supported UNIX platforms • Regardless of platform endian format • Convert older diskgroup formats • vxcdsconvert
vxconfigd View of the Private Region # vxdisk list c1t98d0s2 Device: c1t98d0s2 devicetag: c1t98d0 type: auto hostid: anthrax disk: name= id=1100903318.96.anthrax group: name=tcrundg id=1100903416.107.anthrax info: format=sliced,privoffset=1,pubslice=4,privslice=3 flags: online ready private autoconfig autoimport pubpaths: block=/dev/vx/dmp/c1t98d0s4 char=/dev/vx/rdmp/c1t98d0s4 privpaths: block=/dev/vx/dmp/c1t98d0s3 char=/dev/vx/rdmp/c1t98d0s3 version: 2.1 iosize: min=512 (bytes) max=2048 (blocks) public: slice=4 offset=0 len=17674902 disk_offset=7182 private: slice=3 offset=1 len=3334 disk_offset=3591 update: time=1104952824 seqno=0.23 ssb: actual_seqno=0.0 headers: 0 248 configs: count=1 len=2431 logs: count=1 len=368 Defined regions: config priv 000017-000247[000231]: copy=01 offset=000000 enabled config priv 000249-002448[002200]: copy=01 offset=000231 enabled log priv 002449-002816[000368]: copy=01 offset=000000 enabled Multipathing information: numpaths: 1 c1t98d0s2 state=enabled Private Region Header
Private region internals • Config • vxconfigd stores the persistent object information • Layout/size of volumes • Associations between objects • Diskgroup version • Klog • Kernel logs changes • vxconfigd discovers what changed
Managing VxVM Private Region Directly vxprivutil • For advanced users and customer support • Located in /etc/vx/diag.d • Functions • Set private region header attributes (e.g., hostid, dgname, …) • View diskgroup before importingvxprivutil dumpconfig /dev/rdsk/c3t2d0s2 |vxprint –D - • View klog contents vxprivutil dumplog /dev/rdsk/c3t2d0s2
VxVM Internals Overview • VxVM objects • What they are • How they affect IO • Private region • Structure • Usage • New in Version 4.0 • Configuration backup and restore • Logging • FlashSnap • Volume sets
Newin 4.x rootdg is no longer required This Impromptu slide added after VISION • Customers really like that rootdg is optional now • Reserved diskgroup names • bootdg, defaultdg, nodg • rootvol can be in any diskgroup • The –g option is now required for most commands • Use vxdctl defaultdg <dgname> to avoid typing –g <dgname>
Newin 4.x Configuration Backup and Restore • Automatically save a current copy of the private region • Configuration changes save a back up in /etc/vx/cbr/bk • a binary copy of the config region • Disk information • vxprint -m • Commands to restore the private region • Administrative commands to restore diskgroup configuration • Restore only to the same hardware • Matching hardware id (eg serial number, WWN, lunid…) • Operation in a cluster • Shared diskgroup backups done on master • Private diskgroup backups done where the diskgroup is imported
Configuration Backup and Restore Commands • vxconfigbackupd automatically backs up configuration after every change • Manual backup can be done • vxconfigbackup <diskgroup name> • Prepare for restore • vxconfigrestore with either –n or -p • -n - don’t update the private region header • -p – private region header may be corrected to match backup • Configuration in memory • Volumes in read-only mode • View the configuration with vxprint before committing it • Commit restoration with vxconfigrestore –c • Private region completely updated • Discard the changes with vxconfigrestore –d
Newin 4.x Command and Transaction Logging • Command log • History of VxVM commands run on the system • Transaction log • History of operations performed by vxconfigd • Useful for auditing • Actions taken by administrators • Actions taken by client processes • Logs kept in /etc/vx/log • Along with existing GUI log • The size and number of logs can be set by administrator
Newin 4.x Command Logging • vxcmdlog # vxcmdlog -l Values for Control Variables: Command Logging is currently ON Maximum number of log files = 5 Maximum size for a log file = 1048576 bytes • Log format - /etc/vx/log/cmdlog # 32155, 1535, Thu Apr 21 13:15:23 2005 /usr/sbin/vxprint
Newin 4.x Transaction Logging • vxtranslog # vxtranslog -l Values for Control Variables: Transaction Logging is currently ON Query Logging is currently ON Maximum number of log files = 1 Maximum size for a log file = 1048576 bytes • Log format - /etc/vx/log/translog Thu Apr 21 13:15:23 2005 Clid = 32155, PID = 1535, Part = 0, Status = 0, Abort Reason = 0 DG_GETCFG_ALL 0x5de49f DISCONNECT <no request data>
DRL and FlashSnap • Both are bitmaps • DRL minimizes recovery time after system crash • Tracks writes that are active on the volume • Only active writes need to be recovered • Flashsnap minimizes data copy after • Disk/array/cable failure • User error
Dirty Region Log (DRL) • Bitmap of regions where writes may be in progress • Written before writing data • Cleared lazily • Only used if volume has at least two active mirrors • Implementation • Log subdisk • May be combined with FlashSnap bitmaps in V4.0 • Limited number of concurrent dirty bits • Bounds recovery time • Per volume limit (fixed at 256) • Per system limit (tunable)
FlashSnap • Collection of bitmaps • One bitmap for each snapshot • One bitmap for all detached plexes • Bitmaps enabled after events • Snapshots (point-in-time copies) of user data • Plex detach (FastResync or FMR) • Bitmaps on both original and snapshot volumes • Refresh from A to B • Restore from B to A
Newin 4.x FlashSnap in 4.0 • Instant full-size snapshots • Snapshot created before copying any data • Bitmap identifies the copied data • Perform copy-on-write for uncopied regions • Space optimized snapshots • Full copy of data never needed • Changed data stored in smaller cache volume • Snapshot hierarchy • Restore B from C (B and C are snapshots of A) • DRL bitmap is part of the FlashSnap bitmap set • One write will update both bitmaps
DRL vs FlashSnap (IO path) my_vol my_vol SNAP-my_vol FlashSnap Bitmap DRL Bitmap my_vol-01 my_vol-02 my_vol-03 A A A 1 B B 1 1
Capacity Planning for Snapshots • Issue • What percentage of a volume’s blocks is written during snapshot life • vxtrace can be used to record all VxVM IO #vxtrace –l –g my_dg my_vol 546 2462972 START write vol my_vol dg mydg op 0 block 33219 len 48 547 2462972 START write vol my_vol dg mydg op 0 block 201829 len 36 548 2462972 START write vol my_vol dg mydg op 0 block 117097 len 29 546 2462973 END write vol my_vol dg mydg op 0 block 33219 len 48 time 1 549 2462973 START write vol my_vol dg mydg op 0 block 148447 len 8 548 2462974 END write vol my_vol dg mydg op 0 block 117097 len 29 time 2 547 2462975 END write vol my_vol dg mydg op 0 block 201829 len 36 time 3 549 2462975 END write vol my_vol dg mydg op 0 block 148447 len 8 time 2 • Analyze vxtrace output to determine % of volume written during vxtrace run time
Newin 4.x Volume Sets • Group of different volume types (mirrored, striped…) • VxFS uses private ioctls to do IO to the volume set • VxVM commands manage individual volumes • Enabling technology for VxFS Multi-Volume File System • Separate file system meta-data from user data • Allocate files on the “right” type of storage • Relocate files based on changing conditions
Newin 4.x Volume Sets • Up to 256 volumes of any type • All volumes made from disks in a single disk group • Individual volumes don’t appear in “/dev/vx/…” • Example # vxvset –g homedg make HomedirSet MirrorVol # vxvset –g homedg addvol HomedirSet RAIDVol1 # vxvset –g homedg list HomedirSet VOLUME INDEX LENGTH STATE CONTEXT MirrorVol 0 10240 ACTIVE - RAID5Vol1 1 10240000 ACTIVE - • File systems are created on volume sets # mkfs –F vxfs /dev/vx/rdsk/homedg/HomedirSet # mount –F vxfs /dev/vx/dsk/homedg/HomedirSet /home • FlashSnap can snapshot all the volumes in the volume set to make a consistent copy
Conclusion • VxVM is object based • Robust • Flexible • Visibility into performance at all levels • Private region • Diskgroups are self describing • Transportable between UNIX platforms • Accessible to third party developers • New features in 4.0 • Configuration backup and restore • Command and transaction logging • Instant snapshots • Space optimized snapshots • Volume sets
Next Steps • New Features in Storage Foundation 4.0 and 4.1(S120R) • Thursday, April 28, 2005 (10:15 am – 11:15 am) • VERITAS Storage Foundation Roadmap and Future Directions (S143) • Wednesday, April 27, 2005 (8:00 am - 9:00 am) • Strategies for Implementing Tiered Storage (S192) • Wednesday, April 27, 2005 (8:00 am - 9:00 am) • Tuning Dynamic Multipathing for Maximum Performance and Availability (S132R) • Thursday, April 28, 2005 (10:30 am - 11:30 am)
& QUESTIONS ANSWERS Mike Root mroot@veritas.com