[Copyright status: this file is in the public domain. In jurisdictions where a copyright holder cannot deliberately release a work into the public domain, this should be interpreted as the closest available approximation, probably something like "you have a license to use this file in any way for any purpose".] Factual information in this file is accurate to the best of my knowledge at time of writing. While I am a professional code author, and I have written an independent ssh implementation, as a cryptographer I am strictly a dilettante, and "the best of my knowledge" and any opinions should be interpreted with that in mind. This file is still a work-in-progress. Terrapin is an attack against ssh's binary packet protocol. It is a prefix truncation attack, that is, it permits deleting some initial subset of the supposedly-protected data. It posits an attacker with full control over the octet stream between the peers, able to inspect, delay, delete, modify, and insert octets at will. This is the strongest type of attacker normally considered as a crypto opponent, but it also is the kind of attacker ssh is intended to be able to protect against. To understand Terrapin, we need a little background. ssh can in principle run over any bidirectional, flow-controlled, 8-bit-clean octet stream connection; the commonest underlying layer is TCP. Most of ssh is built on top of a layer called the binary packet protocol, here abbreviated to BPP. The BPP starts with some cleartext message exchanges, during which algorithm negotiation occurs; a shared secret is generated using Diffie-Hellman (or something philosophically similar; the BPP does not require that Diffie-Hellman be used, only that it able to perform similar functions, allowing two parties to establish a shared value which is secret from anyone snooping the exchange - for ease of language, I'll write as though it always is Diffie-Hellman). Each side then computes bulk encryption keys based on the shared secret and the info that went into the algorithm negotiation and the Diffie-Hellman inputs. This provides protection against an attacker meddling with (for example) the algorithm lists, because any such meddling will mean that the data one peer uses will differ from the data the other peer uses, causing them to derive different encryption keys, leading to connection failure. This initial exchange is called `key exchange', abbreviated `kex'. As soon as kex finishes, the new keys are brought into use, with the rest of the protocol encrypted using the negotiated algorithms. The data stream in each direction is broken into variable-sized messages. Each message has a sequence number, starting from zero; these are implicit (that is, they do not appear on the wire anywhere). However, once encryption starts, ssh provides integrity protection by applying a MAC, a message authentication check/code, to each message; this MAC includes not only the explicit message contents but also the sequence number. Terrapin operates by inserting an IGNORE message into one data stream (for ease of language, I'll write as if it's always the server->client one; that one is the higher-value target) during the cleartext phase, then dropping the first message sent by the server after encryption starts. (It has to be the first message, since the MACs include the sequence number; thus, not dropping the first message will cause its MAC to fail with overwhelming probability.) While the Terrapin paper mentions the possibility of injecting more than one IGNORE and dropping more than one initial message, it does not describe attempting that, probably because it would not be useful against the implementations they were working with. From a theoretical point of view, this breaks the BPP's intent to provide integrity protection, since the supposedly-protected data stream seen by one peer differs from that seen by the other, without the BPP's checks raising any alarm. For it to be exploitable in practice, though, dropping the first packet (or, more generally, the first N packets for some N) after kex has to be useful. Simply sabotaging the connection is not interesting thing to do; an attacker with the capabilities posited can do that much more easily already. The major practical use of this is that some implementations send an ExtInfo packet as the first packet after kex to indicate that certain extensions are supported, and some of the affected extensions are security-relevant, making this a downgrade attack. The Terrapin paper points out two protocol changes which should be done to address these weaknesses: the BPP should reset its sequence number when encryption begins, and the data included in key generation should include the full conversation before encryption begins, not just selected values from it. The best fix, of course, would be a protocol redesign, integrating the above suggestions. The major cost of doing that is that it would introduce a compatibility flag day, so there is a desire to look for mitigations which are compaptible with the installed base. There are various such. The paper points out that both of the above suggestions can be done opportunistically by including magic strings in the offered algorithm lists, thereby signaling support. This is an abuse of the mechanism, but it may be about the best that can be done given the current design. OpenSSH (for better or worse, the dominant implementation in today's network) has implemented a variant: it includes the sequence number reset and, while it does not actually use the full conversation in kex, it does summarily drop the connection if the unencrypted exchange includes anything not strictly necessary (including all the plausible replacement packets Terrapin and its modifications can inject). Deleting the first message from the encrypted keystream requires knowing how many octets it occupies. In most current implementations, this is fixed, or has only a few possible values, because that message is fixed or largely fixed. But one mitigation is to make that message an IGNORE, with a randomly-chosen length, forcing the attacker to guess its size. (To avoid revealing it via side-channels such as TCP segment sizes, various techniques can be used; for example, the sender could send a randomly-chosen number of randomly-sized IGNOREs which collectively add up to about 2K of data, writing the entire batch as a single TCP send.) This adds only a handful of bits of difficulty (my rough estimate: six to eight bits) but reducing the attack success chance by a factor of 64 or more is worth doing. Furthermore, even if the attacker does manage to delete one or more IGNOREs successfully, it won't affect anything in practice. Also, some crypto modes are more vulnerable than others. GCM modes, for example, are immune, because they use their own internal sequence number, which amounts to doing a sequence number reset as far as the crypto is concerned. But other modes are exceptionally vulnerable; for example, ChaCha20-Poly1305 is typically implemented in a way that gives Terrapin a 100% chance of success. (As the Terrapin paper notes, this actually is not a flaw in ChaCha20-Poly1305 per se, but rather in how it is typically integrated into the ssh crypto framework.) Thus, avoiding vulnerable crypto can serve as a mitigation. Even simply rejecting connections that contain IGNOREs among the pre-crypto data stream can be effective. There is little-to-no reason other than attacks to send them.