This directory used to hold my hot-backup-over-network software.  It
had a very old version, though, and I've no longer been updating it.
This README still exists for two purposes: (1) to preserve the text
below, which is documentation I don't think I have anywhere else, and
(2) to tell people trying to use this directory where to look for the
current version.  [Date of this writing is 2022-05-28.  The previous
update was in 2010.]

There are two major pieces to the code: livebackup, which is the
userland code, and diskwatch, which is the kernel code.

livebackup I now distribute principally as a git repo
(git://git.rodents-montreal.org/livebackup is the thing to git clone),
with an unpacked view of it available in my FTP area (also available
over HTTP), ftp.rodents-montreal.org:/mouse/git-unpacked/livebackup/.

diskwatch I also distribute as part of git repos.  I have current
versions for my NetBSD systems derived from 1.4T, 4.0.1, and 5.2.  I
export git repos holding my NetBSD /usr/src trees (and /usr/xsrc for
4.0.1 and 5.2, though those aren't relevant to livebackup).  If $OS is
1.4T, 4.0.1, or 5.2, then the thing to git clone is
git://git.rodents-montreal.org/Mouse/netbsd-fork/$OS/src, and an
unpacked view of it is available on ftp.rodents-montreal.org in
/mouse/git-unpacked/Mouse/netbsd-fork/$OS/src/.  For people just
wanting the diskwatch code, look in
.../$OS/src/HEAD-link/tree/sys/dev/pseudo/ - you want the diskwatch*
files from there.  There are small hooks in the various disk drivers,
too; for those, at the moment, all I have to suggest is cloning the
relevant git repo and use git log to look for commits that touch the
disk driver(s) you care about.

The rest of this file (after the line of equal signs below) is the
original README.  Some of it is out of date (for example, where it
talks about stuff in various directories here) but much of it is not
(such as where it talks about backup files and vnconfig).

/~\ The ASCII				  Mouse
\ / Ribbon Campaign
 X  Against HTML		mouse@rodents-montreal.org
/ \ Email!	     7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

================

Your attention is drawn to, in particular, the CHANGELOG file in
1.4T/user/livebackup/livebackup-*/ - the * represents the date I last
worked on the code, and will change as I put up new versions.

1.4T/user/ is further broken down into the various pieces involved.
The build system I use uses some additional stuff, which is in
1.4T/aux/; if you don't want to set up a build system like mine, you
can just dump the .c and .h files in the same directory and use cc -I.
when compiling.

1.4T/kaux/ contains the patches to add timer sockets.  While these are
used by the livebackup code, they are not strictly part of it, so I've
kept their patches separate.

Since I don't have a 2.0 system really set up properly for my build
system, I've just dumped everything into one directory there.  The aux/
directory holds bits that don't really have anything to do with the
livebackup software (at this writing, only one file).

The kernel/ and kaux/ subdirectories are patch trees, with directory
structure parallel to /usr/src.  (For example, kaux/sys/sys/socket.h is
a patch for /usr/src/sys/sys/socket.h.)

The port to 2.0 is a bit rough - the code really wants to use timer
sockets for timing, and rather than redesign it, I have it fork another
process with a pipe between them which acts enough like a timer socket
to make the code work.  I suspect someone who knows kqueue could use it
to implement something much like timer sockets.

The 2.0 port now includes the server side.  It works for me, in that it
starts without crashing and exhibits basic functionality; if you have
reason to think it's busted somehow, please let me know!

Note that after applying the kernel patches, you will need to create
entries in /dev.  For 2.0, assuming you used the patch to
sys/conf/majors as distributed, these should look like

crw-------  1 root     165,   0 May  4 18:31 diskwatch0ctl
crw-------  1 root     165,   1 Feb 10 22:00 diskwatch0data
crw-------  1 root     165,   2 Feb 10 22:00 diskwatch0dbg
crw-------  1 root     165,   4 May  5 07:20 diskwatch1ctl
crw-------  1 root     165,   5 Feb 10 22:00 diskwatch1data
crw-------  1 root     165,   6 Feb 10 22:00 diskwatch1dbg

(for two diskwatch devices; add 4 more to the minor numbers for each
additional device).  For 1.4T, it will depend on which architecture
you're using - check your architecture's conf.c (eg,
sys/arch/alpha/alpha/conf.c for an Alpha, sys/arch/i386/i386/conf.c for
i386, etc) for the diskwatch major number.  (The minor number scheme is
the same for 1.4T and 2.0.)

It's entirely possible I've missed something; if you think you've found
something I've missed, please let me know.  Because of time
constraints, I have to get this stuff up for FTP before testing it
nearly as thoroughly as I'd like, so if you think something is missing
it is reasonably likely that you're right; don't waste a lot of time
looking for your mistake on the theory that I couldn't possibly have
got it wrong. :-)

After corresponding with someone trying to use this, there are some
things that need remark.

- The code uses labeled control structure.  These take the form of
   strings in angle brackets <"like this"> on various flow control
   constructs.  When building native, these are recognized by the
   compiler (I've added them to the gcc I use); I have a program which
   is designed to run between the preprocessor and the compiler proper
   which converts them to semantically equivalent gotos and labels, for
   the benefit of those who don't want to do likewise to their
   compilers.  Look in ../mouseware/ (that is, a sibling directory to
   the one this README is in) and read the README and PACKAGES files;
   the piece you want is called lcs-cvt.

- There is a bug in the diskwatch code: if a user process sets a
   diskwatch unit watching a partition, and then (the same or another)
   user process tries to set the same diskwatch unit watching a
   different partition, the code will get confused: it will be watching
   both partitions but will have lost track of the first one.  This is
   usually only a minor issue, since (unless you're crazy enough to let
   non-root users access the diskwatch devices), you have to be root to
   exploit it, but it does need to be fixed.

- There is no way to watch a single partition multiple times.  This is
   actually a shortcoming of the kernel diskwatch code, not of lb/lbd,
   and I know of nothing even approximating a workaround in the strict
   sense.  But if you just want to keep multiple simultaneous backups
   of a partition, you can put the backup copy in its own partition on
   the server and then use lb on the server to back that partition up
   somewhere else.

- When a client starts a rescan, it always scans from the beginning of
   the disk to the end.  If it never completes a rescan, it will never
   have a complete backup.  It would probably be better to start at a
   random place, so that after a suitably large number of partial
   rescans it probably will have hit all the disk.  (Of course, this is
   at best a stopgap measure; you really should let it finish a rescan
   to have a good copy.)

- For use in environments where network bandwidth is relatively
   precious, it should be possible to have a rescan send over checksums
   of a whole chunk of blocks, breaking it down into individual blocks
   only if the checksums over the chunk differ.  Pro: less network
   bandwidth used (important for roaming users who have bandwidth
   caps).  Con: checksums some blocks twice (slow); if a chunk is too
   large, most chunks will differ and it will be a net lose - but if a
   chunk is too small, too many identical checksums will be sent which
   could be collapsed into larger chunks.  Finding the sweet spot here
   is somewhat nontrivial.  Thought: try to find it automatically?)

Someone asked me if you can do a "loopback" backup, a backup where the
partition being backed up and the backup copy are both on the same
machine.  Yes, you can do this (though of course there's little point
unless they're on different spindles).  If you want to be extra-sure
that the data won't actually escape onto the network, you can use
127.0.0.1 in both lb's "address to contact lbd at" field and lbd's
"address to listen for connections at" field.

You need to build a kernel with "pseudo-device diskwatch 2" (the number
needs to be at least the number of partitions you're siccing lb on
simultaneously) and boot that kernel before lb will start.  If you get
whines like
	lb: can't open diskwatch control device /dev/diskwatch0ctl: Device not configured
then this is probably what's wrong.  (It could also be that you have
the device major/minor numbers mismatched between your /dev entries and
your kernel, so check those if you think your kernel is supposed to
have diskwatch in it.)

At first sight it may look as though you can use lb on RAW_PART
(partition c on most ports, partition d on i386 and maybe a few others)
to back up an entire drive.  While this will initially appear to work,
it will not work right, because the mechanism that sidetracks copies of
writes works only when the partition being written to is the partition
being watched.  Watching one partition will not notice writes occurring
to another partition, even if the sectors actually written belong to
both partitions.  (RAW_PART is the commonest case of such overlap, but
this is actually true more generally.)  Thus, using RAW_PART will
update the backup partition whenever it does a rescan, but unless
that's the partition that's actually being used, you won't get *live*
backups.

It is reasonable to take a backup image as written by lbd and use
vnconfig to attach a vnd to it to peek inside.  However, there are two
caveats.  The first is the gotcha mentioned in one of the "to be fixed"
items above: the file may not be quite as large as it should be.  The
other is that stretches of 0x00s on the disk tend to turn into holes in
the backup image, since lbd won't write them during its initial scan if
it's creating the file - and vnd does not get along well with holes in
its backing file.  There are at least three ways to deal with this.
    - Fix vnd.  This is the best long-term fix, but is probably beyond
       most users of my software.
    - Pre-create the backup file, at least as big as the partition
       being backed up, before first having lbd update it - but use a
       tool that writes the entire file (such as dd if=/dev/zero) -
       tools that merely set the file size without writing the whole
       file won't help with this problem.
    - Fill in the holes with something like
		dd if=thefile of=thefile conv=notrunc bs=1048576
       before attaching the vnd to the file.

The name livebackup is perhaps unfortunate.  While preparing to talk
about this software at BSCan 2005, I discovered there is a commercial
product called LiveBackup, from storactive.com; the name is apparently
too obvious a name for the functionality (while their code is
file-level rather than filesystem-level and for Windows rather than
NetBSD, it sounds as though it's otherwise philosophically similar,
backing up changes in real time or near-real-time).  My code has
nothing to do with their product except for the coincidence of the
internal name.  (Directories have to have _some_ name.)

/~\ The ASCII				  Mouse
\ / Ribbon Campaign
 X  Against HTML		mouse@rodents-montreal.org
/ \ Email!	     7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B