Each cloud has a cloud-wide master key, K, which each cloud member must have a copy of. Each cloud member has a cloud ID, which is a small integer 0..255, and one or more cloud IPs. These are statically configured. Loosely speaking, there are three kinds of cloud members: those with globally-routed IPs (type 1), those with NATted IPs (type 2), and those with private IPs (type 3). Type 1 hosts can expect to establish a complete graph of connections among themselves. Type 2 hosts can expect to connect to any type 1 host, but not conversely. Type 3 hosts typically can reach a type 1 or 2 host on a private IP, but nothing more. When a new host comes up, it connects to any available host; type 1 or 2 hosts then learn what other hosts they might be able to connect to, and all types learn how to route packets. A type 1 or 2 host tries to maintain connections to all (other) type 1 hosts. A type 3 host just maintains its connection to its type 1 or 2 upstream(s). See the PUBLIC message, below, for how a type 1 or 2 host learns about type 1 hosts to connect to. Actually, there are two properties hosts can have which generate the possible types. (They are, strictly, orthogonal, though one combination is unlikely to occur, hence only three, not four, types above.) The first property is whether hosts pay attention to PUBLIC packets and try to connect to the endpoints listed in them; the other is whether hosts listen for incoming connections. Type 1 does both. Type 2 do only the former. Type 3 do neither. A host can be configured to do the latter but not the former, which we might call type 4, but it's not clear this would actually be useful (one possible use might be dealing with multiple layers of NAT/firewalling). A rudimentary routing protocol, somewhat akin to RIP, runs over the cloud links; this is how topology changes get noticed by other hosts. Every host periodically announces, to every neighbour it has, the reachability of every cloud member it believes it can reach. Convergence is accelerated by a triggered-update mechanism: whenever the hop count to a member changes, an announcement is scheduled for much sooner than it otherwise would be. See the REACH message, below, for details. Each cloud member also announces its cloud IP(s) periodically. Each member maintains a cloud ID <-> cloud IP mapping table, which is sent when a new connection comes up and every once in a while as well. See the IPMAP message, below, for details. B(x) is a one-byte string whose byte contains the low eight bits of x. HASH(s) is the SHA256 hash operation. EECB(p,k) is ECB Rijndael encryption of p with key k. p and k must each be 32 bytes long. DECB(p,k) is ECB Rijndael decryption of p with key k. p and k must each be 32 bytes long. CRC32(s) is CRC-32 with polynomial edb88320 and initial value 31bb29b0. on connection: - listener sends 16 random bytes, Rl. - Connecter computes Rc = 16 random bytes t = CRC32(K||Rl||K||Rc||K) b = one-byte XOR of all bytes of t xs = B(b) || B(b+1) || ... || B(b+31) s = Rl || t || Rc hk = HASH(K) for (i=0;i<64;i++) s = EECB(EECB(0,HASH(K||B(i)||s||B(i)||K)) XOR xs, hk) and sends Rc || b || s - Listener checks this. On failure, listener silently throws away all state, leaving the connecter with a half-open connection. On success, listener sleeps for a short time (a few seconds) before continuing. - Each end computes Kl2c = HASH( "->" || K || Rl || "->" || Rc || K || "->" ) Kc2l = HASH( "<-" || K || Rl || "<-" || Rc || K || "<-" ) - Kl2c and Kc2l are master keys in the listener-to-connecter and connecter-to-listener directions. In each direction, this key is called Kd; the other direction's Kd is Kd'. - Immediately after the above, each direction rekeys. To rekey, an end computes thus: - Generate 32 random bytes, Rk. - Generate random padding P1, of length L, 0<=L<65536. (See below for notes on the choice of L.) - Generate a random string P2, 32 octets long. - Let N be the minimum number of bits required to represent L (mathematically, floor(lg(L))+1, except for a special case for L=0); N will be a number from 0 through 15. - Replace the high four bits of the first octet of P2 with N; replace the last N bits of P2 with L. (With P2 considered as a 256-bit big-endian number, L goes in the low bits and N goes in the high 4 bits.) - Let the second and third octets of P2 be L1 and L2, respectively. (From the 256-bit-number perspective, L1 is the second-highest 8 bits and L2 the third-highest.) - Send EECB(Rk,Kd) || EECB(P2,Kd) || P1, in the clear. Because P1 exists for traffic-analysis resistance, this should be done as a single send, with L chosen so the resulting send is not prima facie distinguishable from recent traffic. - Let SER be the number of rekeys performed in this direction so far, not including this one, represented in big endian using an integer number of octets, but as few octets as possible. (0 uses zero octets; 1..255 use one octet, 256..65535 use two octets, etc.) - Compute t = Kd || Rk || Kd' k = "" for (i=0;i<8;i++) t = EECB(HASH(t||Rk||B(i)||SER||t),Kd) k = k || t The resulting k is used as an arcfour key for data that end sends, with the first 64KB of the arcfour keystream discarded. The receiver must of course mimic the computation using the decoded Rk and the Kd and Kd' values the other end used, to obtain the key to decrypt the data stream. - Let D be 0x3fff0000 + (L1 * 256) + L2. Then the next rekey is performed at the earliest packet boundary (see below) that is at least D octets later. That is, the number of octets sent between this rekey's P1 and the next rekey's EECB(Rk,Kd) will be the smallest possible number >=D which places the rekey at a packet boundary. Senders and receivers must keep track of bytes sent and received to know where rekeys occur in the data stream. The rekeying algorithm also requires an awareness of packet boundaries. Data layered atop the above low-level protocol consists of a stream of packets. A packet consists of length, type, and contents; the contents may be up to 65535 bytes long. Specifically, a packet contains Type, one octet: 86 IGNORE ea IP b5 REACH 5e PUBLIC 64 IPMAP 10 IAM 79 (reserved for future use) c1 (reserved for future use) 23 (reserved for future use) 0d (reserved for future use) 9b (reserved for future use) (These were chosen randomly under the constraint that each pair of codes have a Hamming distance >=4.) Length L, two octets big-endian. L bytes of contents. Packet types' semantics: IGNORE Contains anything. Ignored by its receiver. IP Contains Hopcount (one octet) Destination cloud member ID (one octet) Payload (everything from ID to end-of-packet) This packet is routed to the cloud member whose ID it bears; that member then treats everything after the ID as a received IP packet. If the receiver believes the destination member is unreachable, it should silently drop the packet. The hopcount is initialized to the infinity value for REACH packets (qv); it is decremented at each hop. If the hopcount is zero and the receiver is not the target host, it should silently drop the packet. REACH Contains a stream of zero or more of Cloud member ID (one octet) Distance (one octet) A host sends a REACH at least once per minute for each pair, where ID is a cloud member ID the sender believes it can reach and ngh is the neighbour to which the REACH is sent. (The actual time is randomized slightly, to avoid traffic spikes, but the max is one minute.) Distance figures are hop counts, with infinity being 15. Split horizon with poisoned reverse is used. PUBLIC Contains a stream of zero or more of IP address/port (variable, see below) Timestamp (time_t, four bytes big-endian) Each host sends out a PUBLIC at least every five minutes; one is also sent gratuitously to a new connection. A type 1 host always includes its own connection endpoints with the timestamp set to the current time. All hosts include IPs they've heard from other hosts, but without changing the timestamps. An IP whose timestamp is more than 15 minutes old is dropped from PUBLIC announcements. IPMAP Contains a stream of zero or more of Cloud member ID (one octet) IP (variable, see below) This declares that the cloud member has the IP address. If a cloud member has more than one IP, this is represented by the ID appearing more than once, with different IPs. Upon receiving an IPMAP with a given ID, all previous IPs for that ID should be replaced. (This implies that an ID's list of IPs should never be split across multiple IPMAP packets.) When a new connection comes up, each end announces its entire mapping table to the other; changes are flooded, and IPMAPs are sent spontaneously every once in a while (the exact interval is of little import, since the spontaneous messages should never affect anything). IAM Contains one byte, that being the cloud member ID of the sender. This is sent by each peer at the beginning of a new connection, to let each end know who the other is, and never at any other time. This is useful because there are circumstances which lead to having two connections between each pair of type 1 hosts; when two hosts realize they have two connections between them, the one with the lower ID value chooses one and closes it. (Until this happens, the other peer may use whichever one of them it pleases.) Periodic gratuitous packets (eg, REACH) may be suppressed if the underlying connection they would be sent to isn't moving bits. An IP address is represented as a protocol ID byte followed by zero or more bytes depending on the protocol ID. Currently defined protocol ID bytes, with descriptions of their address formats: 0 IPv4 Address: four bytes in network byte order. 1 IPv6 Address: sixteen bytes in network byte order. An IP address/port is the same thing, but with a port number added: 0 IPv4 Address: four bytes in network byte order. Port: two bytes in network byte order. 1 IPv6 Address: sixteen bytes in network byte order. Port: two bytes in network byte order.