This document is, by intention, a spec detailed enough that two implementations written to it will necessarily interoperate. It is very incomplete at present. There are users and there are data blobs. In each case, they are grouped into classes. Data blob names, user names, and class names may not contain spaces, tabs, or newlines. Access is granted on a per-class-pair basis: for any user class and data class, access can be granted or not. If user class U is granted access to data class D, then any user in class U can read the contents of any data blob in class D, regardless of what other classes the user and data blob may or may not belong to. Every user has a public/private key pair, generated randomly when the user is first added to the system. The user's passphrase to the system is used to encrypt the private portion; the user name, the public portion, and the encrypted private portion are stored in the user's record in the user database file. Every user class has a public/private key pair, generated randomly when the user class is created. The public portion is stored in the clear; the private portion is stored encrypted. The key is generated randomly, and many copies of it are stored, each encrypted with the public key of one user who belongs to that class. Every data class has a public/private key pair, generated randomly when the data class is created. The public portion is stored in the clear; the private portion is stored encrypted. The key is generated randomly, and many copies of it are stored, each encrypted with the public key of one user class who has access to that data blob class. Every data blob is stored encrypted with a key, randomly chosen when the data blob is created. Many copies of the key are stored, each encrypted with the public key of one data class to which the blob belongs. There is also a master public/private key pair, generated at installation time. Everything that is encrypted with a variety of public keys is also encrypted with the master public key (this is intended to allow recovery of, for example, a data blob class to which no user classes have access). Some things are stored encoded in base 94. This consists of treating the data block to be stored as a large number (MSB first) and converting it to base 94, with the "digits" being the ASCII characters ! (33) through ~ (126), with 0 corresponding to ! and 93 to ~, in the obvious way. The length of the data is stored as a byte count in ASCII decimal; a ! separates this count from the data. This ! can be thought of as a leading "zero" on the data, or as a separator that is not part of the data; when viewed the second way, leading !s are never stored, even if the value is all zero bits (such a value is stored as a length, a !, and nothing further). Large numbers (as used in public-key crypto) are converted to byte strings by representing them in bytes, big-endian, and prefixing the resulting byte string with four bytes giving (big-endian) the length of the large number in bits. (Numbers with more than 0xffffffff bits cannot be represented.) RSA is the public-key cryptosystem used. The private portion consists of the decryption exponent d, as a large number; the public portion conists of the modulus n and the encryption exponent e, serialized as the concatenation of two large numbers (n first). The size of n is determined by a compile-time constant at the time the key is generated; once generated and stored, it is usable by binaries compiled with other values of that constant. Generated n values always have their high 32 bits set. When a chunk of data is encrypted with a symmetric key, Rijndael is used, with 256-bit keys and blocks. (Below, Recb(data,key) is ECB-mode Rijndael; Rctr(data,key,iv) is CTR-mode Rijndael.) All private keys are 256 bits long; when a private key is encrypted with RSA, it is first padded with random bits to the size of the modulus. (If this generates a value larger than the modulus, the padding is re-generated.) To encrypt a chunk of data P with symmetric key K, generate a 256 random bits B, then compute B'=Recb(B,K); then the encrypted chunk is B'||Rctr(P||Recb(0,SHA256(P)),B,SHA256(K||B||K)). The user database file is called "passwd". Each record has three fields: the user name, the public key portion, and the encrypted private key portion. The second and third fields are stored encoded in base 94. The key with which the private portion is encrypted is the SHA256 of the user's passphrase. User classes are listed in a file called "uclass". Each line of this file consists of four or more fields, separated by single spaces; except as noted ("plaintext"), all are stored in base 94: - The plaintext class name. - The class's public key. - The class's private key, encrypted. - The private-key encryption key encrypted with the master key. - Two fields per user belonging to the class, containing the plaintext user name and the class's private-key encryption key encrypted with the user's public key. Data classes are listed in a file called "dclass". Each line of this file consists of four or more fields, separated by single spaces; except as noted, all are stored in base 94: - The plaintext class name. - The class's public key. - The class's private key, encrypted. - The private-key encryption key encrypted with the master key. - Two fields per user class having access to this data class, containing the plaintext user class name and the data class's private-key encryption key encrypted with the user class's public key. Data blobs are listed in a file called "data". Each line of this file consists of three or more fields: - The plaintext data blob name. - The plaintext name of a file containing the encrypted data blob contents. - The data encryption key encrypted with the master key. - Two fields per data class this data blob belongs to, containing the data class name in plaintext and the data blob's data encryption key encrypted with the data class's public key. Data blob contents files contain the blob's contents, encrypted as above, but *not* converted to base 94. Their names begin with "blob"; the rest is chosen in any way the implementation cares to, provided that (a) the resulting file name contains no whitespace and (b) care is taken to avoid collisions. Files are locked as needed, with flock(2), using LOCK_SH when reading and LOCK_EX when writing (or potentially writing). Since multiple files are involved, there has to be a well-defined locking order to avoid deadlock. While there is no fixed order in which locks must be taken, an implementation must act so as to avoid deadlocking when run concurrently with itself and/or an implementation which locks thus: - All write locks are taken before any read locks. - If multiple write locks are taken, they are taken in this order: passwd, dclass, uclass, data, blob*. (There is no defined order between different blob* files, but as none of the commands involve write-locking multiple blob files, this does not matter.) - If multiple read locks are taken, they are taken in the same order: passwd, dclass, uclass, data, blob*. This does not necessarily mean that locks must be taken in that order (and indeed this would border on impossible in some cases), but care must be taken to avoid deadlock (such as using LOCK_NB with appropriate unlock-and-retry strategies) when using other orders. The master file is not locked, since it should "never" change.