580 likes | 865 Views
Rsync on HP-UX Brief Presentation By Unix/Linux Apprentice with 26 Years of Experience. Dusan Baljevic Sydney, Australia. Why This Document?. As a classical electronics/telecommunications engineer, I believe in proper planning process – measure three times before cutting :
E N D
Rsync on HP-UXBrief Presentation By Unix/Linux Apprentice with 26 Years of Experience Dusan Baljevic Sydney, Australia
Why This Document? As a classical electronics/telecommunications engineer, I believe in proper planning process – measure three times before cutting: http://www.circlingcycle.com.au/dusan.html It is based on my 26-year practical experiences in Unix/Linux. My way of Operations Acceptance Testing and other tools: http://www.circlingcycle.com.au/Unix-sources/ Rsync on HP-UX Webinar
Rsync History rsync is an open source utility that provides fast incremental file transfer. rsync is freely available under the GNU General Public License. It has been available for more than 10 years now. Available for Unix, Linux, and Windows. Official site: http://rsync.samba.org/ Rsync on HP-UX Webinar
Rsync Features… File based. Incremental writes. File delta. Full restore. Can use RSH, SSH or direct sockets as the transport. Transmission security via SSH. Internal pipelining reduces latency for multiple files. File security via Encrypted File System (EvFS). Cannot handle open-files (skips transferring them). Cannot handle raw volumes. Does not detect file renames and conflicts. Scheduling done through O/S tools (cron, at, batch). Rsync on HP-UX Webinar
Rsync Alternatives and Add-ons… There are over 90 projects that deal with rsync in some way: http://unix.freshmeat.net/search/?Go.x=1&Go.y=1&q=rsync§ion=projects Other tools for file/directory synchronization: http://en.wikipedia.org/wiki/Comparison_of_file_synchronization_software Rsync on HP-UX Webinar
Rsync Algorithm The rsync utility uses an algorithm (invented by Australian computer programmer Andrew Tridgell) for efficiently transmitting a structure (such as a file) across a communications link when the receiving computer already has a different version of the same structure. The recipient splits its copy of the file into fixed-size non-overlapping chunks, of size S, and computes two checksums for each chunk: the MD4 hash, and a weaker “rolling checksum”. It sends these checksums to the sender. The sender then compares its rolling checksums with the set sent by the recipient to determine if any matches exist. If they do, it verifies the match by computing the MD4/MD5* checksum for the matching block and by comparing it with the MD4/MD5 checksum sent by the recipient. The sender then sends the recipient those parts of its file that didn't match any of the recipient's blocks, along with assembly instructions on how to merge these blocks into the recipient's version to create a file identical to the sender's copy. If the sender's and recipient's versions of the file have many sections in common, the utility needs to transfer relatively little data to synchronize the files. Rsync on HP-UX Webinar
Current Rsync Checksum Weaknesses – Part 1* As mentioned on the previous slide, the recipient splits its copy of the file into fixed-size non-overlapping chunks and computes two checksums for each chunk: the MD4 hash, and a weaker 'rolling checksum' (version 30 of the protocol, released with rsync version 3.0.0, now uses MD5 hashes rather than MD4). It sends these checksums to the sender. The sender computes the rolling checksum for every chunk of size S in its own version of the file, even overlapping chunks. This can be calculated efficiently because of a special property of the rolling checksum: if the rolling checksum of bytes n through n + S − 1 is R, the rolling checksum of bytes n + 1 through n + S can be computed from R, byte n, and byte n + S without having to examine the intervening bytes. Thus, if one had already calculated the rolling checksum of bytes 1–25, one could calculate the rolling checksum of bytes 2–26 solely from the previous checksum, and from bytes 1 and 26. The rolling checksum used in rsync is based on Mark Adler's adler-32 checksum, which is used in zlib, and is itself based on Fletcher's checksum. Rsync on HP-UX Webinar
Current Rsync Checksum Weaknesses – Part 2 The sender then compares its rolling checksums with the set sent by the recipient to determine if any matches exist. If they do, it verifies the match by computing the hash for the matching block and by comparing it with the hash for that block sent by the recipient. The sender then sends the recipient those parts of its file that did not match the recipient's blocks, along with information on where to merge these blocks into the recipient's version. This makes the copies identical. However, there is a small probability that differences between chunks in the sender and recipient are not detected, and thus remains uncorrected. This requires a simultaneous hash collision in MD5 and the rolling checksum. It is possible to generate MD5 collisions, and the rolling checksum is not cryptographically strong, but the chance for this to occur by accident is nevertheless extremely remote. With 128 bits from MD5 plus 32 bits from the rolling checksum, and assuming maximum entropy in these bits, the probability of a hash collision with this combined checksum is 2exp(−(128+32)) = 2exp(−160). The actual probability is a few times higher, since good checksums approach maximum output entropy but very rarely achieve it. Rsync on HP-UX Webinar
Current Rsync Checksum Weaknesses – Part 3 If the sender's and recipient's versions of the file have many sections in common, the utility needs to transfer relatively little data to synchronize the files. The MD5 Message-Digest Algorithm is a widely used cryptographic hash function that produces a 128-bit (16-byte) hash value. Specified in RFC 1321, MD5 has been employed in a wide variety of security applications, and is also commonly used to check data integrity. However, it has been shown that MD5 is not collision resistant;[3] as such, MD5 is not suitable for applications like SSL certificates or digital signatures that rely on this property. An MD5 hash is typically expressed as a 32-digit hexadecimal number. Rsync on HP-UX Webinar
Current Rsync Checksum Weaknesses – Part 4 MD5 was designed by Ron Rivest in 1991 to replace an earlier hash function, MD4. In 1996, a flaw was found with the design of MD5. While it was not a clearly fatal weakness, cryptographers began recommending the use of other algorithms, such as SHA-1 (which has since been found also to be vulnerable). In 2004, more serious flaws were discovered, making further use of the algorithm for security purposes questionable; specifically, a group of researchers described how to create a pair of files that share the same MD5 checksum. Further advances were made in breaking MD5 in 2005, 2006, and 2007. In an attack on MD5 published in December 2008, a group of researchers used this technique to fake SSL certificate validity. US-CERT says MD5 "should be considered cryptographically broken and unsuitable for further use“. Most U.S. government applications now require the SHA-2 family of hash functions. Rsync on HP-UX Webinar
Rsync – Memory Utilization At one time, we used rsync version 2.x at a large private hospital group in Australia (53 hospitals). It had one major issue when we used it extensively: huge directory trees were taking lot of RAM. However, if you use rsync version 3.x, it does not keep the whole tree in memory if it is big – it uses an incremental-recursion algorithm. Rsync on HP-UX Webinar
Rsync on HP-UX For HP-UX 11iv2 and 11iv3, it is delivered on HP-UX Internet Express * media, or can be obtained as free download at: http://www.software.hp.com/portal/swdepot/displayProductInfo.do?productNumber=HPUXIEXP1131 For all currently supported HP-UX releases (11iv1 to 11iv3), there is also The Porting and Archiving Centre for HP-UX: http://hpux.connect.org.uk/hppd/hpux/Networking/Admin/rsync-3.0.8/ Make sure when installing depots from this site to also install ALL run-time dependencies (next slide). Compile from sources. Rsync on HP-UX Webinar
Rsync at The Porting and Archiving Centre for HP-UX Description: Rsync uses an algorithm which provides a very fast method for bringing remote files into sync. It does this by sending just the differences in the files, optionally with compression, across the link, without requiring that both sets of files are present at one of the ends of the link beforehand. Author: Andrew Tridgell, Paul Mackerras tridge@samba.org Home URL: http://rsync.samba.org/ License: GNU General Public License v3 Installation Tree: /usr/local Languages used: C Build-time dependencies: gettext libiconv make popt Run-time dependencies: gettext libiconv popt Documentation: Installation README Man Page Rsync on HP-UX Webinar
Rsync Typical Scenario – Part 1 Day 1 – the first backup. No data at the remote location, so a complete file transfer is required: Local File Remote File 4GB Data transferred: Rsync ~2GB (2:1 compression) FTP/rcp/scp ~ 4GB Rsync on HP-UX Webinar
Rsync Typical Scenario – Part 2 Day 2 – 0.5GB is added at the start of the file and the rest of the file is left intact (red blocks of data) *: Local File Remote File 4.5GB Data transferred: Rsync ~0.25GB (2:1 compression) FTP/rcp/scp ~ 4.5GB Rsync on HP-UX Webinar
Rsync Typical Scenario – Part 3 Day 3 – The green (0.4GB) and yellow (0.2GB) blocks of data are moved around (total of 0.6GB – the file contents are the same, just block moved around): Local File Remote File 4.5GB Data transferred: Rsync ~0 (a small overhead) FTP/rcp/scp ~ 4.5GB Rsync on HP-UX Webinar
Rsync Command-line Options # /usr/local/bin/rsync -v Rsync on HP-UX Webinar
Rsync from Internet Express It is important to pass the full path of the rsync command if it is not in the PATH: srvA# /opt/iexpress/rsync/bin/rsync -az -H -v --stats \ --rsync-path=/opt/iexpress/rsync/bin/rsync \ testfile.gz userB@srvB:/somedir Rsync on HP-UX Webinar
Rsync from Porting and Archiving for HP-UX srvA# /usr/local/bin/rsync -a -r -v -t -z --stats \ --progress --rsh=/usr/bin/ssh \ --rsync-path=/usr/local/bin/rsync /labs srvB: Rsync on HP-UX Webinar
Rsync Example when Previous Run Interrupted or Failed The flag “–partial”keeps the partially downloaded files on the target. This option is useful if the data transfer process gets interrupted by some error of malfunction. For example, if the rsync command terminated before all the data was transported we could launch the same command again: srvA# rsync -P myproject.gz userB@srvB:/somedir rsync would use partial check sums to test the validity of that part of the file that was already transported previously and start the actual data transport only for the missing parts of the myproject.gz file. Rsync on HP-UX Webinar
Rsync and SSH Key Exchange userA@srvA# ssh-keygen -t rsa Do not enter a passphrase, or use ssh-agent is passphrase is important. userA@srvA# ssh userB@serverB mkdir -p .ssh userA@srvA# cat .ssh/id_rsa.pub | ssh userB@srvB \ cat >> .ssh/authorized_keys userA@srvA# ssh userB@srvB chmod 0700 .ssh/ userA@srvA# ssh userB@srvB chmod 0600 .ssh/authorized_keys userA@srvA# rsync -a -r -v -t -z --stats --progress \ -e ssh /sourcedir/ userB@srvB:/targetdir/ Rsync on HP-UX Webinar
Rsync – SSH Chains and Different Port rsync files between hosts that can not talk directly – use ssh chain): # rsync -av --rsh="ssh -TA userA@srvA ssh -TA -l userB" \ /mydir/ srvB:/somedir/ rsync on different port and forcing IP protocol: # rsync --progress --partial --rsh="ssh -p 8322" \ --bwlimit=300 --ipv4 remuser@remsrv:~/myfile.tgz . Rsync on HP-UX Webinar
Rsync Example with Full Copy srvA# /usr/local/bin/rsync -a -r -v -t -z --stats \ --progress --rsh=/usr/bin/ssh \ --rsync-path=/usr/local/bin/rsync /labs srvB: ... Number of files: 118 Number of files transferred: 109 Total file size: 483583369 bytes Total transferred file size: 483583369 bytes Literal data: 483583369 bytes Matched data: 0 bytes File list size: 2002 File list generation time: 0.016 seconds File list transfer time: 0.000 seconds Total bytes sent: 393465005 Total bytes received: 2119 Rsync on HP-UX Webinar
Rsync Example with Sparse Files * – Part 1 srvA# prealloc sparsefile1 1024000 srvA# /usr/local/bin/rsync -a -A -r -v -t -z \ --stats --progress \ --r sync-path=/usr/local/bin/rsync /src srvB: srvB# ll sparsefile1 2000 -rw-r----- 1 root sys 1024000 Aug 18 14:50 sparsefile3 Rsync on HP-UX Webinar
Rsync Example with Sparse Files * – Part 2 srvA# prealloc sparsefile2 1024000 srvA# /usr/local/bin/rsync -a -A -S -r -v -t -z \ --stats --progress \ --rsync-path=/usr/local/bin/rsync /src srvB: sending incremental file list src/sparsefile2 1024000 100% 24.88MB/s 0:00:00 (xfer#1, to-check=1/8) Total transferred file size: 1024000 bytes Literal data: 1024000 bytes ... Total bytes sent: 1329 Total bytes received: 35 sent 1329 bytes received 35 bytes 2728.00 bytes/sec total size is 28825484 speedup is 21133.05 Rsync on HP-UX Webinar
Rsync Example with Sparse Files – Part 3 srvA# ll sparsefile* 2000 -rw-r----- 1 root sys 1024000 Aug 18 14:45 sparsefile1 2000 -rw-r----- 1 root sys 1024000 Aug 18 14:50 sparsefile2 srvB# ll sparsefile* 2000 -rw-r----- 1 root sys 1024000 Aug 18 14:45 sparsefile1 0 -rw-r----- 1 root sys 1024000 Aug 18 14:50 sparsefile2 Rsync on HP-UX Webinar
Rsync Example with ACLs * – Part 1 srvA# getacl vxdump # file: vxdump # owner: root # group: sys user::r-x group::r-x class:r-x other:--- srvA# setacl -m u:dusan:rx vxdump srvA# pwget -n dusan dusan:*:111:1::/home/dusan:/usr/bin/sh (User “dusan” has UID 111) # getacl vxdump # file: vxdump # owner: root # group: sys user::r-x user:dusan:r-x group::r-x class:r-x other:--- Rsync on HP-UX Webinar
Rsync Example with ACLs * – Part 2 srvA# /usr/local/bin/rsync -a -r -v -t -z --stats \ --progress --rsync-path=/usr/local/bin/rsync \ /src srvB: Check the file on the target srvB (ACL is missing): srvB# getacl /src/vxdump # file: /src/vxdump # owner: root # group: sys user::r-x group::r-x class:r-x other:--- Rsync on HP-UX Webinar
Rsync Example with ACLs * – Part 3 srvA# /usr/local/bin/rsync -a –A -r -v -t -z --stats \ --progress --rsync-path=/usr/local/bin/rsync \ /src srvB: Check the file on the target srvB (note that the ACL is owned by user “111” because “dusan” is obviously not in the password database, but the ACL is copied!). srvB# getacl /src/vxdump # file: /src/vxdump # owner: root # group: sys user::r-x user:111:r-x group::r-x class:r-x other:--- Rsync on HP-UX Webinar
Rsync with Dry Run When doing something other than non-trivial copies or using features of rsync that you have never used before, add the “-n” or “--dry-run” switch to make it a dry run. # rsync -avhn /mydir /otherdir/ # rsync -nbrvvhn --del /mydir /otherdir/ # rsync -rn --size-only --exclude=*.iso \ /mydir/ /otherdir/ Rsync on HP-UX Webinar
Rsync Anonymous Server from Porting and Archiving for HP-UX – Part 1 Install the depot. Add into /etc/group nobody::-2: Add into /etc/services rsync 873/tcp Add into /etc/inetd.conf rsync stream tcp nowait root /usr/local/bin/rsync rsyncd –daemon ... And reload the daemon (inetd –c) Rsync on HP-UX Webinar
Rsync Anonymous Server from Porting and Archiving for HP-UX – Part 2 Create /etc/rsyncd.conf uid = nobody gid = nobody use chroot = yes read only=yes max connections = 5 log file = /var/adm/syslog/rsyncd.log [ftp] path = /src comment = HP-UX Source export area Check Bastille or disable it. Rsync on HP-UX Webinar
Rsync Anonymous Server from Porting and Archiving for HP-UX – Part 3 To check what is available on the rsync server: # /usr/local/bin/rsync -avlH rsync://myhost src HP-UX Source export area Rsync on HP-UX Webinar
Mirror By Rsync If a remote server runs anonymous rsync server, mirroring is achieved in the following manner: # /usr/local/bin/rsync -avvlH --rsync-path=/usr/local/bin/rsync \ myhost:/src/ /src opening connection using: ssh myhost /usr/local/bin/rsync --server --sender -vvlHogDtpre.isf . /src/ Password: receiving incremental file list created directory /src delta-transmission enabled rsync-3.0.8-ia64-11.31.depot sparsefile2 sparsefile3 total: matches=0 hash_hits=0 false_alarms=0 data=29849484 sent 166 bytes received 29853794 bytes 5427992.73 bytes/sec total size is 29849484 speedup is 1.00 Rsync on HP-UX Webinar
Rsync and LVM Snapshots – Part 1 There is no official way to copy data from a snapshot back to the original volume. Here is how I normally accomplish this task, although it depends on what kind of files I am trying to copy, any special file attributes, and the amount of data If the amount of data is not massive, and there are not many files in the file system), use rsync to re-synchronize data between a snapshot and the original. Be very careful about the command line, because if you use the wrong order of the volumes, the copy of the data goes in the wrong direction! # rsync -aHv /snap/ /orig Rsync is known to be VERY slow for big file transfers and when there are lot of files. Rsync on HP-UX Webinar
Rsync and LVM Snapshots – Part 2 • Since file systems are VxFS, vxdump(1M) and vxrestore(1M) are very convenient. Much faster: # fsck /snap # vxdump 0f - /snap | (cd /orig && vxrestore xf -) • Use native HP-UX tools, like tar(1M), pax(1M), fbackup(1M), and cpio(1M). Here are some simple examples: # cd /snap && tar cpf - * | (cd /orig && tar xvpf -) # cd /snap && fbackup -i . -f - | ( cd /orig && frecover -Xsrf - ) # cd /snap && pax -w . | ( cd /orig && pax -r -pe ) Rsync on HP-UX Webinar
Rsync and Expanded Remote File List Pass the listing of remote command as list of files to rsync. The following line will run the find command on the remote machine in the /remdir directory and rsync all “*.conf" files it finds to the local machine in the /somedir directory. # rsync -avR remsrv:'`find /remdir -name "*.[conf]"`‘ \ /somedir/ Rsync on HP-UX Webinar
Rsync and Relative Directory If you want the target directory structure to be relative (chroot in a way) you can add flag"-R". The directory structure /mydir/data would then look like /BACKUP/mydir/data/ as the sync path name starts from / on the source machine: # rsync -Ravx --timeout=30 --delete-excluded \ user@remsrv:/mydir/data/ /BACKUP/ Rsync on HP-UX Webinar
Rsync and “Snapshots” Use “--link-dest” optionto create space-efficient snapshot-based backups. It appears to have multiple complete copies of the backed up data (one for each backup run) but files that do not change between runs are hard linked instead of creating new copies, thus saving space. The major disadvantage of this technique is that if a file is corrupted due to disk error it is just as corrupt in all snapshots that link to that file. Having offline backups would protect against this possibility. Rsync on HP-UX Webinar
Rsync - File Size and Bandwidth Limits If you need to update a site over a slow link, run two passes of rsync. Transfer the small files firstly: # rsync -a --max-size=150K /mydir/ remsrv:/somedir/ Then do the same for the large files and limit bandwidth to 100 KBytes per second: # rsync -a --min-size=150K --bwlimit=100 \ /mydir/ remsrv:/somedir/ Rsync on HP-UX Webinar
Rsync – Exclude Files An example how to exclude files and directories from rsync transfer. # rsync -azhve ssh --stats --progress \ --exclude-from '/somedir/EXCLUDE.txt' \ --delete-excluded /mydir remsrv:/somedir/ Rsync on HP-UX Webinar
Rsync – Avoid Checksum For Large Files Before Transfer It is a common error to use option “-c” when huge files are transferred. rsync is going to have to read/checksum the entire file, and reading them is going to take a long time, unless the file is stored on SSDs or some very fast storage. Instead, try: # rsync -vhz --partial --inplace … “-c” means that it checksums the entire file BEFORE doing any transfers, rather than using the timestamp to see if it has changed, which means reading the whole file twice. Rsync on HP-UX Webinar
Rsync and “Clean Shell” • The "is your shell clean" message and the "protocol mismatch" message are usually caused by having some program or command running in Shell’s profiles (.cshrc, .profile, .kshrc, .bashrc or equivalent) every time when using a remote-shell program (such as ssh or rsh). Data written in this way corrupts the rsync data stream. rsync detects this at startup and produces those error messages. However, if you are using rsync-daemon syntax (host::path or rsync://) without using a remote-shell program (no --rsh or -e option), there is not remote-shell program involved, and the problem is probably caused by an error on the daemon side (so check the daemon logs). To test it: # ssh user@myhost.domain.dom /bin/true The above command should not output anything at all (except an ssh password prompt, if applicable). If there is any output on stdout, disable whatever is generating that output so that rsync does not get “garbage” trying to talk to the remote rsync. Rsync on HP-UX Webinar
Rsync Common Problems Several common causes for a remote rsync process going away: The destination disk is full (at least the size of the largest file that needs to be updated available must be in free disk space for the transfer to succeed). An idle connection caused a router or remote-shell server to close the connection. A network error caused the connection to be dropped. The remote rsync executable was not found. Remote Shell setup is not working right or is not "clean" (it is sending spurious text to rsync). If the problem might be an idle connection getting closed, use a “--timeout” option (newer rsync versions send keep-alive messages during periods of no activity). Rsync on HP-UX Webinar
Rsync Performance Analysis and Tuning There is no silver bullet! Test everything and rsync will certainly bring benefits. Rsync on HP-UX Webinar
Thank You! Dusan Baljevic Sydney, Australia
Appendix Dusan Baljevic Sydney, Australia
Rsync Command-line Options – Part 1 Usage: rsync [OPTION]... SRC [SRC]... DEST or rsync [OPTION]... SRC [SRC]... [USER@]HOST:DEST or rsync [OPTION]... SRC [SRC]... [USER@]HOST::DEST or rsync [OPTION]... SRC [SRC]... rsync://[USER@]HOST[:PORT]/DEST or rsync [OPTION]... [USER@]HOST:SRC [DEST] or rsync [OPTION]... [USER@]HOST::SRC [DEST] or rsync [OPTION]... rsync://[USER@]HOST[:PORT]/SRC [DEST] The ':' usages connect via remote shell, while '::' & 'rsync://' usages connect to an rsync daemon, and require SRC or DEST to start with a module name. Appendix- Rsync on HP-UX Webinar
Rsync Command-line Options – Part 2 Options -v, --verbose increase verbosity -q, --quiet suppress non-error messages --no-motd suppress daemon-mode MOTD (see manpage caveat) -c, --checksum skip based on checksum, not mod-time & size -a, --archive archive mode; equals -rlptgoD (no -H,-A,-X) --no-OPTION turn off an implied OPTION (e.g. --no-D) -r, --recursive recurse into directories -R, --relative use relative path names --no-implied-dirs don't send implied dirs with --relative -b, --backup make backups (see --suffix & --backup-dir) --backup-dir=DIR make backups into hierarchy based in DIR --suffix=SUFFIX set backup suffix (default ~ w/o --backup-dir) -u, --update skip files that are newer on the receiver --inplace update destination files in-place (SEE MAN PAGE) --append append data onto shorter files --append-verify like --append, but with old data in file checksum Appendix- Rsync on HP-UX Webinar
Rsync Command-line Options – Part 3 Options -d, --dirs transfer directories without recursing -l, --links copy symlinks as symlinks -L, --copy-links transform symlink into referent file/dir --copy-unsafe-links only "unsafe" symlinks are transformed --safe-links ignore symlinks that point outside the source tree -k, --copy-dirlinks transform symlink to a dir into referent dir -K, --keep-dirlinks treat symlinked dir on receiver as dir -H, --hard-links preserve hard links -p, --perms preserve permissions -E, --executability preserve the file's executability --chmod=CHMOD affect file and/or directory permissions -A, --acls preserve ACLs (implies --perms) -o, --owner preserve owner (super-user only) -g, --group preserve group --devices preserve device files (super-user only) --specials preserve special files Appendix- Rsync on HP-UX Webinar