lb - live backup This documentation file is in the public domain. lb is designed to do, basically, over-the-net disk mirroring (usually to files, but potentially to disks). There are two modes of operation: client and server. As a client, it receives notifications from the kernel of writes and mirrors them over the net to the server. If it loses the connection to the server, it retries persistently to reestablish it until it succeeds. As a server, it listens for the client to connect, then handles the server side of the mirroring. If it loses the connection to the client, it goes back to waiting for the client to connect. You will need one pair to each filesystem to be mirrored, since there is no other demultiplexing layer. Since lb is intended to be useful even over long-haul networks, it encrypts all communication, leveraging a shared secret in a manner vaguely akin to (but quite different in some respects from) Diffie-Hellman, to generate a key used for encryption in each direction. The network protocol between the client and server is fairly simple. At startup, the client and server exchange data from which shared encryption keys are derived. After these, all communication is encrypted. A verifier is exchanged, so that each end can verify that the shared keys are in fact shared. Then, a simple packetized protocol is run in each direction. In detail: On startup, each end sends 16 bytes of random data. If the data sent by the server is S, the data sent by the client is C, and the shared key is K, then each end computes thus: Let M[0] = SHA1( K || S || 0x73 0x2d 0x3e 0x63 || C || K ) Let M[i] = SHA1( K || S || 0x73 0x2d || M[i-1] || 0x3e 0x63 || C || K ) for i in 1..31 Compute a 237-byte string A by overlapping the M[] values thus and adding overlapped bytes modulo 256 (ie, without carries between bytes): M[0] = x x x x x x x x x x x x x x x x x x x x M[1] = x x x x x x x x x x x x x x x x x x x x M[2] = x x x x x x x x x x x x x x... M[3] = x x x x x x x... The 256-byte string A || SHA1(A)[0..18] is used as an arcfour key; after keying, the first 65536 bytes of the key stream are discarded. The result is used to encrypt the server->client data stream. The same computation is repeated, with S and C interchanged and the 0x73 and 0x63 in the M[] computation interchanged, for client->server encryption. After this is done, each end generates another 16 bytes of random data, which it writes to the peer, encrypted with the first 16 bytes of the encryption stream for that direction. The peer decrypts them and echos them back encrypted suitably for the other direction - thus, each side should get its random token back unchanged, after all encryption and decryption has been done. If this is not so, the end that discovers the problem summarily drops the connection. All multi-byte numbers below are sent big-endian (network byte order). Assuming the crypto exchange passes, each end drops into a simple packetizing protocol. A packet consists of a one-byte type, followed by additional data whose quantity and interpretation depend on the type. The types are (see lb.h for numeric values): LB_DATA (client->server only) This represents a block of data to be written to the backup. It is followed by 516 bytes: first 4 bytes of block number, then 512 bytes of block data. LB_RQSUMS (client->server only) This is a request to send back checksums of some blocks. It is followed by 10 bytes: first 4 bytes of starting block number, then 4 bytes of block count, then 1 byte of blocking factor, then 1 byte of checksum type. The server responds with a stream of LB_SUMS packets. If the block count is N and the blocking factor is F, then ceil(N/F) LB_SUMS packets will be generated. All but the last will contain F checksums; the last will contain N-(F*(ceil(N/F)-1)) checksums. (This is modified if the request is aborted with LB_STOPSUM.) Checksum types are given helow. LB_STOPSUM (client->server only) This aborts an LB_RQSUMS the server is still responding to. It carries no additional data. It always provokes an LB_ABORTED; if the server is still generating LB_SUMS packets when it processes the LB_STOPSUM, it stops doing so as soon as it sees the LB_STOPSUM packet. It is an error for the client to send another LB_RQSUMS before a previous one has finished, without doing an LB_STOPSUM first. It is not an error to send an LB_STOPSUM when no LB_RQSUMS is in progress, so the race between the server finishing an LB_RQSUBS and the client aborting it is a non-issue. LB_SIZE (client->server only) This passes the size from the client to the server. It is normally sent when the client finishes a rescan. It carries 4 bytes of data, which gives the partition size in blocks. LB_SUMS (server->client only) This carries checksums. It is followed by a variable number of checksums, as described under LB_RQSUMS. LB_ABORTED (server->client only) This indicates that an LB_STOPSUM has been processed. It carries no additional data. If an LB_RQSUMS was in progress, no further LB_SUMS will be generated for it after the LB_ABORTED. LB_PING (either direction) LB_PONG (either direction) Thse form an are-you-alive test. Neither type carries any additional data. LB_PING always provokes an LB_PONG response, without affecting anything else that may be in progress; LB_PONG is never sent except in response to LB_PING. Checksum types: CKT_SHA1 Each checksum is 20 bytes, containing the SHA-1 hash of the block. CKT_SUM_SHA1 Each checksum is 21 bytes, the first containing the mod-256 sum of all bytes in the block, the rest containing the SHA-1 hash of the block.