Each cloud has a cloud-wide master key, K, which each cloud member must have a copy of. Each cloud member has a cloud ID, which is a small integer 0..255, and one or more cloud IPs. These are statically configured. Loosely speaking, there are three kinds of cloud members: those with globally-routed IPs (type 1), those with NATted IPs (type 2), and those with private IPs (type 3). Type 1 hosts can expect to establish a complete graph of connections among themselves. Type 2 hosts can expect to connect to any type 1 host, but not conversely. Type 3 hosts typically can reach a type 1 or 2 host on a private IP, but nothing more. When a new host comes up, it connects to any available host; type 1 or 2 hosts then learn what other hosts they might be able to connect to, and all types learn how to route packets. A type 1 or 2 host tries to maintain connections to all (other) type 1 hosts. A type 3 host just maintains its connection to its type 1 or 2 upstream(s). See the PUBLIC message, below, for how a type 1 or 2 host learns about type 1 hosts to connect to. Actually, there are two properties hosts can have which generate the possible types. (They are, strictly, orthogonal, though one combination is unlikely to occur, hence only three, not four, types above.) The first property is whether hosts pay attention to PUBLIC packets and try to connect to the endpoints listed in them; the other is whether hosts listen for incoming connections. Type 1 does both. Type 2 does only the former. Type 3 does neither. A host can be configured to do the latter but not the former, which we might call type 4, but it's not clear this would actually be useful (one possible use might be dealing with multiple layers of NAT/firewalling). A rudimentary routing protocol, somewhat akin to RIP, runs over the cloud links; this is how topology changes get noticed by other hosts. Every host periodically announces, to every neighbour it has, the reachability of every cloud member it believes it can reach. Convergence is accelerated by a triggered-update mechanism: whenever the hop count to a member changes, an announcement is scheduled for much sooner than it otherwise would be. See the REACH message, below, for details. Cloud members can also advertise routes to networks that are behind them. Other cloud members will normally install such routes whenever the advertising member is reachable; see the ROUTE message, below. Each cloud member also announces its cloud IP(s) periodically. Each member maintains a cloud ID <-> cloud IP mapping table, which is sent when a new connection comes up and every once in a while as well. See the IPMAP message, below, for details. B(x) is a one-byte string whose byte contains the low eight bits of x. HASH(s) is the SHA256 hash operation. EECB(p,k) is ECB Rijndael encryption of p with key k. p and k must each be 32 bytes long. DECB(p,k) is ECB Rijndael decryption of p with key k. p and k must each be 32 bytes long. CRC32(s) is CRC-32 with polynomial edb88320 and initial value 31bb29b0. on connection: - listener sends 16 random bytes, Rl. - Connecter computes Rc = 16 random bytes t = CRC32(K||Rl||K||Rc||K) b = one-byte XOR of all bytes of t xs = B(b) || B(b+1) || ... || B(b+31) s = Rl || t || Rc hk = HASH(K) for (i=0;i<64;i++) s = EECB(EECB(0,HASH(K||B(i)||s||B(i)||K)) XOR xs, hk) and sends Rc || b || s - Listener checks this. On failure, listener silently throws away all state, leaving the connecter with a half-open connection. On success, listener sleeps for a short time (a few seconds) before continuing. - Each end computes Kl2c = HASH( "->" || K || Rl || "->" || Rc || K || "->" ) Kc2l = HASH( "<-" || K || Rl || "<-" || Rc || K || "<-" ) - Kl2c and Kc2l are master keys in the listener-to-connecter and connecter-to-listener directions. In each direction, this key is called Kd; the other direction's Kd is Kd'. - Immediately after the above, each direction rekeys. To rekey, an end computes thus: - Generate 32 random bytes, Rk. - Generate random padding P1, of length L, 0<=L<65536. (See below for notes on the choice of L.) - Generate a random string P2, 32 octets long. - Let N be the minimum number of bits required to represent L (mathematically, floor(lg(L))+1, except for a special case for L=0); N will be a number from 0 through 15. - Replace the high four bits of the first octet of P2 with N; replace the last N bits of P2 with L. (With P2 considered as a 256-bit big-endian number, L goes in the low bits and N goes in the high 4 bits.) - Let the second and third octets of P2 be L1 and L2, respectively. (From the 256-bit-number perspective, L1 is the second-highest 8 bits and L2 the third-highest.) - Send EECB(Rk,Kd) || EECB(P2,Kd) || P1, in the clear. Because P1 exists for traffic-analysis resistance, this should be done as a single send, with L chosen so the resulting send is not prima facie distinguishable from recent traffic. - Let SER be the number of rekeys performed in this direction so far, not including this one, represented in big endian using an integer number of octets, but as few octets as possible. (0 uses zero octets; 1..255 use one octet, 256..65535 use two octets, etc.) - Compute t = Kd || Rk || Kd' k = "" for (i=0;i<8;i++) t = EECB(HASH(t||Rk||B(i)||SER||t),Kd) k = k || t The resulting k is used as an arcfour key for data that end sends, with the first 64KB of the arcfour keystream discarded. The receiver must of course mimic the computation using the decoded Rk and the Kd and Kd' values the other end used, to obtain the key to decrypt the data stream. - Let D be 0x3fff0000 + (L1 * 256) + L2. Then the next rekey is performed at the earliest packet boundary (see below) that is at least D octets later. That is, the number of octets sent between this rekey's P1 and the next rekey's EECB(Rk,Kd) will be the smallest possible number >=D which places the rekey at a packet boundary. Senders and receivers must keep track of bytes sent and received to know where rekeys occur in the data stream. The rekeying algorithm also requires an awareness of packet boundaries. Data layered atop the above low-level protocol consists of a stream of packets. A packet consists of length, type, and contents; the contents may be up to 65535 bytes long. Specifically, a packet contains Type, one octet: 13 IGNORE 79 IP 34 REACH ce PUBLIC 2a IPMAP a1 IAM 67 DUP 98 ROUTE d5 TELL f2 OPTNEG 0d (reserved for future use) 40 (reserved for future use) bf (reserved for future use) (These were chosen randomly under the constraint that each pair of codes have a Hamming distance >=4.) Length L, two octets big-endian. L bytes of contents. Packet types' semantics: IGNORE Contains anything. Ignored by its receiver. (The intent is that this can be used to make traffic-analysis attacks against the protocol more difficult.) IP Contains Hopcount (one octet) Destination cloud member ID (one octet) AF, the packet's address family (one octet) Payload (everything from AF to end-of-packet) This packet is routed to the cloud member whose ID it bears; that member then treats everything after the AF as a received IP packet. If the receiver believes the destination member is unreachable, it should silently drop the packet. The hopcount is initialized to the infinity value for REACH packets (qv); it is decremented at each hop. If the hopcount is zero and the receiver is not the target host, it should silently drop the packet. The address family is one of the values used to specify address families when encoding IP addresses (see below). REACH Contains a stream of zero or more of Cloud member ID (one octet) Distance (one octet) A host sends a REACH at least once per minute for each pair, where ID is a cloud member ID the sender believes it can reach and ngh is the neighbour to which the REACH is sent. Distance figures are hop counts, with infinity being 15. Split horizon with poisoned reverse is used. PUBLIC Contains a stream of zero or more of IP address/port (variable, see below) Timestamp (time_t, see below) Each host sends out a PUBLIC at least every five minutes; one is also sent gratuitously to a new connection. A type 1 host always includes its own connection endpoints with the timestamp set to the current time. All hosts include IPs they've heard from other hosts, but without changing the timestamps. An IP whose timestamp is more than 15 minutes old is dropped from PUBLIC announcements. IPMAP Contains a stream of zero or more of Cloud member ID (one octet) IP (variable, see below) Timestamp (time_t, see below) This declares that the cloud member has the IP address. If a cloud member has more than one IP, this is represented by the ID appearing more than once, with different IPs. Upon receiving an IPMAP with a given ID, all previous IPs for that ID should be replaced. (This implies that an ID's list of IPs should never be split across multiple IPMAP packets.) When a new connection comes up, each end announces its entire mapping table to the other; changes are flooded, and IPMAPs are sent spontaneously every day or so. When a host generates an IPMAP, it uses the current time for the timestamp for its own data; for others' data, it uses the timestamp it most recently heard. This allows stale data to be aged out eventually; if a mapping has not been refreshed in a week or so, it is dropped. There is no way to indicate "this ID has no IPs"; a cloud member with no IPs is useful only in circumstances well outside the use cases this is designed for. IAM Contains one byte, that being the cloud member ID of the sender. This is sent by each peer at the beginning of a new connection, to let each end know who the other is, and never at any other time. DUP Contains one byte of data: DUP_DUPLICATE 0x97 (ef, 76, 89, da, 02, 3b, b0, 68, 51, 1c, 25, c4 RFU) There are various circumstances in which it is perfectly normal to have two connections between the same peers. To reduce waste, when this is noticed, the peer with the lower ID chooses one of the two connections to close. It then sends a DUP whose data byte is DUP_DUPLICATE on that connection. When the peer receives this, it closes the connection it's received over. A peer must never send anything on a connection after sending a DUP on it, but it must still be prepared to receive packets on it. ROUTE Contains a stream of zero or more of Cloud member ID (one octet) Mask width (one octet) If mask width is 255, nothing If mask width is not 255, IP (variable, see below) Timestamp (time_t, see below) This declares that the named cloud member is advertising a route to the given IP network with the given mask width. Upon receiving a ROUTE with a given ID, all previous routes for that ID should be replaced. (This implies that an ID's list of routes should never be split across multiple ROUTE packets.) The case where the mask width is 255 represents "this ID is advertising no routes"; it is an error to combine this with any other entries for the same ID. When a new connection comes up, each end announces its entire route table; changes are flooded, and ROUTEs are sent spontaneously every day or so. ROUTEs that have not been refreshed in a week or so are dropped. TELL Used to send commands to other cloud members and receive responses back from them. Contains Destination ID (one octet) Source ID (one octet) Hopcount (one octet) Opcode (one octet) 0x00 COMMAND 0x01 RESPONSE other RFU For COMMAND and RESPONSE: Cookie length, CL (one octet) Cookie data (CL octets) Data (variable, see below) If ID A wants to tell ID B to do something A generates a TELL packet with destination = B, source = A, opcode = command, and the command for B to execute in the data. When B receives it, it executes the command, packages the resulting output (which may be zero-length) up into another TELL packet, and sends it. As compared to A's TELL packet, B's has the source and destination swapped and the opcode set to response. B's cookie length and cookie data must be copied directly from A's packet; these are opaque to everyone but A. There is no provision for command output overflowing the max packet size; B has to take some reasonable action in that case. If a node receives a TELL packet with someone else's ID as the destination, it normally forwards it to what it thinks the best direction is to reach that cloud member, but it will drop it if the hopcount is zero or if it thinks the target cloud member is unreachable. When generating a new TELL packet, the hopcount is initialized to the infinity value for REACH packets (qv). OPTNEG Used for connection-specific optional-feature negotiation. Contains Opcode (one octet) 0x00 OFFER 0x01 ACCEPT 0x02 REJECT other RFU For OFFER, ACCEPT, and REJECT Option name length, PL (one octet) Option name (PL octets) The end wishing to initiate negotiation of an optional feature sends an OFFER packet naming the feature. The other end then sends an ACCEPT to turn that feature on or a REJECT to leave it off. If a feature requires further negotiation, that must be handled in a feature-specific way after turning it on. In general, features are enabled independently for the two directions of a connection - negotiating a feature on in one direction does not, in general, have anything to do with it being on or not in the other direction of the same connection. (While a particular feature's specification may say otherwise, that is specific to the feature in question.) There is currently only one feature implemented; see below about timestamps. There are various protocol fields described as `timestamp' or `time_t', which are timestamps measured in seconds since the UNIX epoch. By default, these are four-byte fields. But, if the OPTNEG mechanism has been used to negotiate an option `stamp64' on, then timestamps sent by the offering end, the end that sent the OPTNEG OFFER stamp64, are 8 bytes, not 4. (It is the offering end's responsibility to ensure it does not send any timestamp-bearing packets after offering stamp64 but before seeing the accept/reject for it.) Periodic gratuitous packets (eg, REACH) may be delayed if the underlying connection they would be sent to isn't moving bits. An IP address is represented as a protocol ID byte followed by zero or more bytes depending on the protocol ID. Currently defined protocol ID bytes, with descriptions of their address formats: 0 IPv4 Address: four bytes in network byte order. 1 IPv6 Address: sixteen bytes in network byte order. An IP address/port is the same thing, but with a port number added: 0 IPv4 Address: four bytes in network byte order. Port: two bytes in network byte order. 1 IPv6 Address: sixteen bytes in network byte order. Port: two bytes in network byte order.