.\" As its sole author, I explicitly place this file in the public domain. .\" It may be used by anyone in any way for any purpose, though I would .\" appreciate credit where it's due. .\" der Mouse, mouse@rodents.montreal.qc.ca, 2007-12-26 .Dd December 26, 2007 .Dt COMPARE 1 .Os NetBSD 1.4T .Sh NAME .Nm compare .Nd directory tree comparison .Sh SYNOPSIS .Nm .Op Ar flags .Ar directoryspec ... .Sh DESCRIPTION .Nm compares directory trees, possibly on remote machines, optionally considering one of them as a master copy and updating the rest to match it. There are many flags that control operation in various ways. .Pp The argument list consists of flags (see below) and directory specs. A directory spec can be a simple local pathname, a remote directory spec, or a remote network connectionn spec (which is assumed to connect to a .Nm instance which was given .Fl R ) . .Pp The usual syntax for remote network connections is .br \&\ \& .br .Sm off .D1 Oo .Op Ar user No \&@ .Ar machine .Op \&! Ar program .Ns \&: .Oc Ar directory .Sm on .br \&\ \& .br where .Ar machine is the name of the remote machine the directory is on; .Ar user and .Ar program are optional arguments specifying the user name on that machine and the program name needed to start the remote .Nm executable, respectively. (By default, the user name is whatever the remote-access program uses (see the .Fl rsh option) and the program is .Sy argv[0] of the local .Nm run. The machine name is not optional for remote directory specs.) The user name must not contain .Ql \&@ and the program name must not contain .Ql \&: ; machine names may be placed in .Ql \&[\ \&] , in which case they may contain anything but a .Ql \&] , or not, in which case they may not contain .Ql \&! or .Ql \&: and must not begin with .Ql \&[ . The machine name must not contain an .Ql \&@ unless a user name is given. A local directory name may contain any characters, but if it contains .Ql \&: , it must have a .Ql \&/ before it; a remote directory name has no restrictions. .Pp A remote may also be described by merely specifying how to contact it over the network. This is done with a directory spec beginning with a .Ql \&: , which is followed by a keyword, another .Ql \&: , and more data depending on the keyword. The possible forms this can take are: .Bl -tag -width indent .Sm off .It Xo .Li \&:connect: .Ar host .Li \&: .Ar port .Li \&: .Ar path .Xc .Sm on The master is to connect to the given .Ar port on the given .Ar host and assume that whatever it reaches there is the appropriate remote .Nm run, with .Ar path being the directory to use relative to the remote's starting directory. .Sm off .It Xo .Li \&:accept: .Ar port .Li \&: .Ar path .Xc .Sm on The master is to listen at the given .Ar port on the local machine, expecting a connection, which is assumed to be the appropriate remote .Nm run, with .Ar path being the directory to use relative to the remote's starting directory. .El .Pp Recognized flags are: .Bl -tag -width indent .It Fl debug .It Fl D Enables various debugging messages. They are likely to be quite voluminous and are not documented here, especially since they are of little use to anyone not working with the code. See the source. .It Fl follow .It Fl h Follow symlinks when possible. Normally, symlinks are not followed; instead, the link-to strings are compared or copied. With this option, links are followed whenever they point to something. (Symlinks that point to nonexistent paths may still be recognized as symlinks; the difference is that .Nm uses .Xr stat 2 instead of .Xr lstat 2 . ) .It Fl sgid-dirs .It Fl g Ignore the setgid bits on directories, and, when creating or modifying directories, don't meddle with the setgid bits. (This is intended for systems which overload this bit to indicate other things, such as some Sun OSes, which use it to indicate BSD vs SysV group-owner semantics.) .It Fl no-owners .It Fl o Do not consider ownership values: do not compare them, do not print them when (other) differences are found, and do not attempt to copy them. This is suitable when running .Nm as a user who cannot .Xr chown 2 files arbitrarily (which usually means, a non-root user). .It Fl R Remote. The networking features of .Nm are implemented by running a .Nm process on remote machines via .Xr ssh 1 or .Xr rsh 1 ; the remotes are given this option to indicate that they are remotes. When this is used, .Nm speaks a private protocol on its standard input and output, assuming it is communicating with a master .Nm run elsewhere. (The protocol is not documented; see the source code.) The only time this option should be given manually is when a remote .Nm is being started in a way that doesn't get its command line directly from the corresponding master .Nm process. It is used with .Fl accept , for example, or when starting a .Nm remote from .Xr inetd 8 . .Pp Certain of the other options (such as .Fl update and .Fl md5 ) are implemented entirely in the master and never need to be given to remote runs; two others .Pf ( Fl mtimes and .Fl encode ) affect the protocol and must be the same on the master run and the remote runs (both given or both not given). Additionally, if .Fl gzip is given on the master, it must be given on the remots, but if the master does not use it, it does not matter whether the remotes do. Certain other options may usefully differ between the master and the remotes, notably .Fl follow , .Fl no-owners , and .Fl sgid-dirs . .It Fl mtimes .It Fl t Use modification times. Normally, modification times (see the .Sq mtime field in .Xr stat 2 ) are considered irrelevant and ignored. With this option, they are compared and/or copied. .It Fl update .It Fl u Consider the first-named directory as a master copy and update all the others to make them like the master. .It Fl encode .It Fl x The protocol spoken between a master .Nm process and its remotes is a binary protocol. While the network connections should be transparent, they aren't always; with this flag, all communication between the master and the remotes is encoded using a simple binary-to-text filter. This filter doubles the byte count, but does mean that all traffic is plain ASCII text, which has been known to work around certain data-dependent network bugs. .It Fl trace .It Fl X Print a log of all communication between the master and the remotes. If .Fl encode is used too, trace output includes both encoded and unencoded data. .It Fl readonly .It Fl ro When used on a remote run, this rejects any attempt by the master to modify anything. This is suitable for use when running from .Xr inetd 8 with .Fl R or the like (you probably want .Fl limit as well). .It Fl check Never prints anything; rather, exit with status 0 (success) if all directories match, status 1 (failure) if not. If a mismatch is detected, .Nm does not proceed to compare anything further. .It Fl ignore-sockets Ignores .Dv AF_LOCAL sockets found in the directory trees being compared. The effect is almost as if the socket didn't exist; the only difference is that if some other difference is found for a given name, sockets having that name are printed as such, rather than being described as nonexistent. .It Fl no-sparse Normally, when .Nm creates files, it skips (512-byte) blocks of 0x00 bytes with .Xr lseek 2 , thereby creating sparse files. This option suppresses this check, always writing data even when it's blocks of 0x00 bytes. .It Fl links This option says that .Nm should compare (or, with .Fl update , update) not only file types, contents, etc, but also hard link structure. Updating does not necessarily make link counts equal, since some links may be outside the directory trees .Nm is working with, but it does mean that, after updating, if two files are (or are not) hard-linked together in the master directory, they will be (or not be) hard-linked together in the updated directories. When not updating, basically, a difference is considered to exist whenever something would need to be done if .Nm were updating. .It Fl ignore-nonexistent .It Fl inx Specifies that the next directory argument is to be flagged for slightly different treatment. When the only reason to print a difference report is that an entry is nonexistent in one or more directories thus flagged, no difference is considered to exist. (Do not use this option with .Fl update ; the result is .Dq whatever the implementation happens to give you , not anything well-defined.) .It Fl md5 Normally, .Nm compares plain files by reading their contents and comparing them. When used over the network, this can be very slow (as compared to a local comparison). This option says to compare files' MD5 checksums instead (see .Xr libmd5 3 ) . .It Fl no-delete Indicates that an update operation (see .Fl update ) should ignore files that exist on the remote but not the master, rather than deleting them as it normally would. .It Fl gzip Ar arg When copying plain file contents across the network, this option says to gzip the data at the sending end and gunzip it at the reading end. This is intended for cases where the network is slow enough compared to CPUs that saving bytes sent is worth burning CPU cycles for. The argument is passed to gzip; it is expected to be something like .Fl -best or .Fl -fast . .It Fl rsync Indicates that updates of plain files should use an algorithm like .Xr rsync 1 Ap s . The algorithm and protocol are not identical to those used by .Xr rsync 1 , (since .Xr rsync 1 Ap s are not documented), but compatability is not relevant since .Xr rsync 1 is not actually used on either end. Note that this flag requires enough free disk space on the destination filesystem to store the updated file. .It Fl prune Ar path Anything called .Ar path (which is taken as a path relative to the directories being compared) is to be ignored if found; neither it nor anything under it is to be compared, updated, or otherwise touched. The only case in which .Nm will manipulate such a thing at all is that if a directory is copied wholesale because it is found to be nonexistent during an update operation, the directory contents being copied are not checked to see if they've been pruned. .Pp This option may be given multiple times to prune multiple things. .It Fl prunewild Ar pattern This is just like .Fl prune except that the argument is a globbing pattern rather than a simple name. Note that, while shell globbing syntax is used, slashes are not special (for example, .Li a*b matches .Li aa/bb ) . .It Fl prunebase Ar pattern This is just like .Fl prunewild (not .Fl prune ) except that the argument is matched against just the last component of the path name being considered (see .Xr basename 1 and .Xr basename 3 ) . .It Fl accept Ar port Causes .Nm to do a TCP listen on the given local .Ar port for a connection from the master, which is expected to use a .Li \&:connect: spec. This option also requires that .Fl R be given. .It Fl connect Ar host Ar port Causes .Nm to do a TCP connect to the given .Ar host and .Ar port , where it expects to find the master listening with a .Li \&:accept: spec. This option also requires that .Fl R be given. .It Fl prunex This reverses the sense of the test used to detect whether something is pruned. Rather than pruning things named with .Fl prune (and .Fl prunewild etc) flags, everything is pruned .Em except those things. .It Fl bg This makes .Nm background itself as soon as any initial setup (such as listening for connections for .Fl accept ) has been completed. It works only in conjunction with .Fl R . .It Fl limit Normally, a remote .Nm will accept any path from the master as the path-to-compare. Relative paths are interpreted relative to .Nm Ap s working directory; absolute paths are interpreted normally, as absolute paths. Anything the .Nm process can access can thus be compared. When this flag is specified on a remote run (ie, with .Fl R ) , the path from the master must not be absolute (must not begin with a .Sq Pa / ) and must not contain any .Sq Pa \&.. components. If the path from the master violates these restrictions, .Nm simply prints a complaint and exits. This is suitable for use (in conjunction with .Fl ro ) for exporting a directory tree for general use by running .Nm via .Xr inetd 8 or equivalent, since it prevents malicious remote users from accessing anything outside the directory .Nm is started in. (Make sure the working directory is the root of the tree you want to make available.) .It Fl rsh Ar cmd When running remote .Nm processes using the .Li machine:directory style of spec, it runs the remote process by forking and execing .Xr ssh 1 . This option specifies some other program to run in place of .Xr ssh 1 . The program specified is expected to understand an argument list consisting of the machine name, followed by .Sq \&-l Ar username if the optional .Sm off .So Ar user No \&@ .Sc .Sm on was given, then the remote program name and arguments. .Pp This option affects only directory names given after it; it is thus possible to use different remote-shell programs for different remote directory specs. .It Fl dir Ar arg This is just like specifying .Ar arg as a directory name, except that it is unambiguous if .Ar arg begins, or might begin, with a .Sq \&- . There is no difference, once specified, between a directory spec given this way and one given without any option; in particular, all the same syntaxes are available. .It Fl tick Ar N Causes .Nm to produce a brief summary of its progress through the directory hierarchy (the path of the name being processed, relative to the directory being compared) every .Ar N seconds. The timer does not start running until all remotes have been started successfully. .It Fl mask Ar M All mode bits (see the .Sy st_mode field in .Xr stat 2 ) are ANDed with .Ar M (which is always taken as an octal number) before comparison. This does not affect the printed mode bits if a difference is detected (for this or any other reason), nor does it modify the mode bits that get used when creating or modifying things during an update run; all it affects is when two .Sy st_mode fields are considered equal. Only the 07777 bits are used. .It Fl modemod Ar A Ar B When .Fl update is in effect, this causes all mode bits (see the .Sy st_mode field in .Xr stat 2 ) from the master directory to be ANDed with .Ar A and then ORed with .Ar B (both of which are always taken as octal numbers). This occurs before anything else is done with them; thus, unlike .Fl mask , it affects not only difference detection, but mode bits printed when difference is detected or installed on remotes when updating. .Pp When .Fl update is not in effect, this causes all mode bits from any directory to be modified as described. Again, this happens very early, before comparison and printing. .Pp Only the 07777 mode bits are affected in any case, regardless of the values specified on the command line. .It Fl map Ar dir Ar from Ar to This option applies to the next directory spec, but no others. It causes .Nm to convert .Sq Ar from to .Sq Ar to when reading directory .Ar dir . The thing named .Sq Ar from must not be a directory (this is for ease of implementation and could be fixed if it became worth it); if it is, what happens is undefined. If an entity named .Sq Ar to already exists in this directory (and is not itself remapped by another .Fl map option), an error occurs. Mappings do not cascade; if a mapping triggers, the resulting name is never mapped again, even if it matches another .Fl map option. .It Fl force Ar arg Normally, a remote .Nm will accept any path from the master as the path-to-compare. With this option, the path from the master is completely ignored and .Ar arg is used instead. (This is needed only in unusual circumstances.) .El .Sh SIGNALS .Nm recognizes two signals: .Bl -bullet .It .Dv SIGALRM provokes an immediate report of the kind produced for .Fl tick (indeed, .Fl tick is implemented by simply requesting periodic .Dv SIGALRM signals with .Xr setitimer 2 ) , even if .Fl tick was not actually given. .It .Dv SIGINFO (such as is typically generated from the tty when .Sq \&^T is typed) produces a more verbose report, with one line per level of the directory hierarchy in the path leading to the name being processed, showing an indication of where that component falls in the list of names in its directory. If six reports are generated without any measurable progress having been made, an additional line is generated saying which remote is being slow and thus preventing progress from being made. .El .Sh BUGS MD5 (see the .Fl md5 option) is known to have weaknesses. While they are not significant for most uses, it is possible to craft files that checksum the same under MD5 but are different; such differences will not be noticed with .Fl md5 . .Pp There is no way to use checksums other than MD5 (such as the SHA-\fIn\fP family or block-cipher-based algorithms). .Pp There is no way to use anything other than gzip for compressing file contents on the wire. .Sh AUTHOR der Mouse, .Aq mouse@rodents.montreal.qc.ca .