A stream cipher expands a secret key into a long stream of random bytes. The standard security goal for a stream cipher is easy to explain: the attacker can't distinguish the output bytes from independent uniform random bytes. Saying that a random byte is "uniform" means that each of the 2^{8} possible bytes appears with probability 1/2^{8}; "independent" means that the probability of a sequence of bytes is the product of the probabilities of the individual bytes.
There are many ciphers whose secret keys are too short, making it impossible for them to achieve this security goal. But for the moment let's focus on the AES-256-CTR stream cipher, using a uniform random 256-bit key k (probability 1/2^{256} of each particular 256-bit string). That's a long enough key: the attacker has a practically nonexistent chance of ever guessing k, even with a future quantum computer running Grover's algorithm. Saying "can't distinguish" doesn't mean zero chance; it means negligible chance for any reasonable cost.
The central question in the AES literature is whether there's any feasible attack with a noticeable probability of distinguishing the 128-bit output blocks AES_{k}(0), AES_{k}(1), AES_{k}(2), ... from a uniform random sequence of distinct blocks. The research details inspire confidence in the security of AES: any successful attack would be a huge breakthrough in cryptanalysis.
Wait a minute: "distinct blocks"? "Distinct" wasn't part of the security goal! Inspecting b independent uniform random 128-bit blocks will find a collision with probability close to b(b−1)/2^{129}; any collision shows immediately that these are not AES output blocks. As b grows, this becomes an increasingly severe failure of AES-256-CTR to reach the standard security goal.
But let's assume for the moment that b is small enough that this isn't a problem. Then it's safe to rely on AES-256-CTR.
The main topic of this blog post is designing a high-security random-number generator (RNG). Sounds like this is solved by AES-256-CTR, right? Presumably, by hashing enough data from non-malicious entropy sources, we can produce a 256-bit key that's completely unpredictable for the attacker, i.e., indistinguishable from a uniform random key. Then we run AES-256-CTR using this key to generate all the randomness that applications need.
But suppose an attacker steals your computer and looks at what's stored in memory. Can the attacker figure out random numbers that were previously generated? Yes: the AES-256-CTR key k was never erased, so the attacker can compute the whole historical sequence of random outputs: AES_{k}(0), AES_{k}(1), AES_{k}(2), ...
This is actively dangerous if you're relying on the "forward secrecy" of short-term-public-key systems. You use the RNG to generate, e.g., a one-minute ECC key or a single-use New Hope key; you receive data encrypted to that key; you erase the secret key and the plaintext (after reading the plaintext); you then expect that you're safe against an attacker who recorded the ciphertext, even if the attacker subsequently steals your computer. But this expectation is sabotaged by the RNG. In academic terminology, this is a failure of "forward security" of the RNG; in NIST terminology, it is a failure of "backtracking resistance".
Fortunately, there's an easy fix:
This runs at practically the full speed of AES-CTR. An application that asks for short packets of randomness has forward security immediately after each packet, without having to pay for any extra AES computations.
This RNG construction certainly isn't new but I don't recall ever hearing a good name for it. I'm going to call it a fast-key-erasure RNG, recognizing two aspects of the RNG design: first, the RNG erases k the moment that it uses k; second, keys generated as output from the RNG are immediately erased from the RNG, so the RNG is suitable for applications that erase their keys promptly.
The rest of this blog post discusses alternatives, implementation, and security analysis.
NIST has stated that its standard "DRBGs" do "an extra step at the end of each request for random bytes" for backtracking resistance (forward security). More broadly, NIST has claimed that any crypto library's "get random bytes" call is "going to do this extra cryptographic work to ensure backtracking resistance".
As far as I can tell, this isn't a political claim regarding the future popularity of NIST's current RNGs; instead it's a technical claim regarding the work that an RNG must do for forward security. But this technical claim is wrong, as illustrated by fast-key-erasure RNGs.
Here's what's really weird about this. NIST describes its RNGs as "random bit generators". NIST emphasizes forward security as an important feature. But NIST's RNGs incur massive costs if the user actually wants forward secrecy after every random bit. Was there never an effort to optimize the RNGs to provide both advertised features simultaneously?
NIST has an "AES-CTR-DRBG" RNG based on AES-CTR, but this RNG doesn't erase the key the instant that the key is used. The RNG starts generating blocks of AES output for the caller. The RNG continues generating enough blocks for all the randomness that the caller wants. Finally, once the caller is satisfied, the RNG generates further AES output to overwrite its key. If the caller merely wanted, say, 1 byte of random data, then there's
That's 64 bytes of AES output, if I'm correctly deciphering the NIST standard, for 1 byte of useful random data.
It's entirely possible that submissions to NIST's "Post-Quantum Cryptography Project" will be seriously slowed down by NIST's AES-CTR-DRBG. In reaction to this possibility, NIST seems to be recommending that submissions use an extended AES-CTR-DRBG interface that handles a request for b_{1} bytes, a subsequent request for b_{2} bytes, a subsequent request for b_{3} bytes, etc. by implicitly generating a single AES-CTR-DRBG output with more than b_{1}+b_{2}+b_{3}+... bytes and returning segments of this output. This is compatible with answering each request immediately: the extended interface returns the first b_{1} bytes of output without knowing b_{2} etc., and without knowing the total number of bytes that will be required for this AES-CTR-DRBG output.
From a security perspective, this is just like using AES-CTR. Unless the caller goes to extra effort to end the AES-CTR-DRBG output, the AES-CTR key is retained after each call to the extended interface, so the "backtracking resistance" that NIST advertises for AES-CTR-DRBG is destroyed.
The extended RNG interface is dangerous because the easiest way to use it fails to erase keys. NIST's DRBGs are dangerous because their poor speed encourages this sort of interface.
A better approach would be to use a proper call to AES-CTR-DRBG to fill up, say, a 736-byte RNG output buffer; when I say "proper call" I mean that AES-CTR-DRBG would then overwrite its old key. Random bytes would then be provided and immediately erased from the buffer, and AES-CTR-DRBG would be called again after 736 bytes. This is almost as small and almost as fast as a fast-key-erasure RNG, but it has a significantly more complicated specification and no advantages.
My previous blog post reported news regarding the SUPERCOP benchmarking package. Part of the news was that SUPERCOP uses a fast-key-erasure RNG, but I didn't explain what this was or how it's implemented. Internally, the RNG is modularized as follows:
I generally recommend against AES for production use, for two reasons. First, AES provides a significantly worse security/speed tradeoff than state-of-the-art ciphers. This doesn't matter for most applications, but sometimes it does matter. RC4 would have been eliminated many years earlier if AES had been faster than RC4 on common CPUs rather than slower. Anecdotal evidence suggests that almost everyone who deploys AES-128 today instead of AES-256 is doing it either because "AES-256 is 40% slower" or because they're copying the choice from someone else; but AES-128, like other 128-bit block ciphers, is at risk from batch attacks and quantum attacks.
Second, natural AES software implementations are vulnerable to cache-timing attacks. Maybe the RNG keys are hard to break because each key is used only a few times, but maybe not; why take the risk? If the CPU has Intel's AES-NI or similar AES hardware, then this problem disappears (and crypto_rng_aes256 becomes slightly faster than crypto_rng_salsa20 and crypto_rng_chacha20), but it's still dangerous to specify AES in systems that will be implemented on multiple platforms; it's safer to specify a cipher that doesn't have the problem in the first place.
Age is worth something, and AES (1998) is older than Salsa20 (2005). On the other hand, Salsa20 has a bigger block size, a larger security margin against all known attacks, extensive review during and after the eSTREAM project, and huge implementation advantages. I'm not so sure that it's right to choose ChaCha20 (2008) over Salsa20: the slight advantages that motivated the ChaCha20 design do seem to be holding up after further analysis, but they aren't as compelling as the advantages of Salsa20 (and ChaCha20) over AES.
Many cryptographic libraries have RNGs that aim for the same security goals but that are more complicated and harder to audit (and slower). I recommend that these libraries switch to a fast-key-erasure RNG.
SUPERCOP doesn't make any guarantees of having been audited, and in particular fastrandombytes will need to be audited before deployment. Some libraries go to extra effort for thread-safety, fork-safety, etc., which means adding and auditing extra code. What I'm recommending here isn't particular software but particular mathematical functions: random numbers should be generated by a fast-key-erasure RNG as described above.
These library RNGs are typically seeded from the RNG in the operating-system kernel, the same way that SUPERCOP's fastrandombytes uses kernelrandombytes as described above. A simpler approach, taken in NaCl, is for the cryptographic library to simply use the kernel RNG without maintaining another RNG layer. Obviously this approach uses less code and is easier to audit; I see no security justification for OpenBSD saying "getentropy() is not intended for regular code" and limiting the getentropy() output to 256 bytes. The syscall might be a speed problem in some post-quantum systems, but if this turns into a real-world problem then I see several ways to deal with it without the current mess of non-kernel RNG code.
Anyway, it's certainly important for the kernel RNG to be secure. Many kernels have RNGs that are again more complicated and harder to audit (and slower) than a fast-key-erasure RNG. I recommend that these kernels switch to a fast-key-erasure RNG.
Kernels in virtual machines face a clone-safety issue analogous to the userspace fork-safety issue. There are complicated solutions where cloning triggers an RNG reinitialization, but the simplest solution is for the kernel to simply call a hypervisor RNG. A hypervisor interface isn't complete without an RNG.
The kernel (or hypervisor) is responsible for properly seeding its RNG. The central problem here is to generate an initial seed during OS installation. Certainly this problem needs to be solved so that the OS can securely generate (e.g.) ssh keys at installation time. Once this central problem is solved, it's relatively straightforward to seed each subsequent kernel boot: on each boot, the kernel immediately uses the RNG to generate a new random seed for the next boot, and immediately makes sure that the new seed has been safely written to long-term storage, overwriting the old seed. The seed in long-term storage also needs to be changed whenever the filesystem is backed up.
Beware that reliably overwriting data on disk isn't as easy as it sounds, and reliably overwriting data on flash storage is really difficult. Below I'll come back to one consequence of this.
How is a seed generated during OS installation? As mentioned above, this seed is obtained as a hash of data from various entropy sources. Precise CPU cycle counts for DRAM access seem to have considerable entropy on some devices but seem completely predictable on others; auditing this requires studying how different pieces of hardware are clocked. Precise cycle counts for keyboard input, spinning disks, etc. seem very hard for the attacker to predict, but many small computers don't have keyboards and disks. Installing these computers typically means flashing them from a master computer, and this master computer should use its RNG to generate a seed for the small computer. Similarly, whenever a computer creates an installation USB stick, it should use its RNG to generate a seed for the stick; and, whenever the stick is booted, it should update its seed, just like any other kernel boot. Installation from a read-only device should demand keyboard input.
Typically a kernel continues using cycle counts of various events to inject new entropy into its RNG after boot. This is often advertised as providing "backward security" (or "prediction resistance" in NIST terminology), but I'm skeptical that "backward security" has any real-world value. If an attacker has broken security so thoroughly as to be able to see the current RNG state stored securely inside the kernel, then why is it noticeably harder for the attacker to also see any future RNG state of interest? Don't we normally envision the attacker's resources as constantly expanding? People hope that each software security update reduces the attacker's resources, so it makes sense to inject new entropy at that point, but this isn't the same as constantly injecting new entropy.
Aiming for "backward security" has created all sorts of complications whose security isn't clear. One can't simply hash a new cycle count into the RNG state: an attacker who knows the previous RNG state and watches the RNG output can efficiently guess every possibility for the cycle count, and then knows the new RNG state. One has to instead accumulate many cycle counts into a separate hash before injecting that hash into the RNG. At this point the auditor asks what "many" means. The answer includes a mess of questionable "entropy estimation" mechanisms. An alternative, used in FreeBSD, is the relatively pleasant Fortuna from Ferguson and Schneier. (As a side note, I highly recommend reading the Ferguson–Schneier–Kohno "Cryptography engineering" book, which has more detailed coverage of some of the issues I'm covering in this blog post, and also covers many other important issues in cryptography.)
For comparison, the argument for "forward security" is much more convincing: erasing keys has real value against an attacker who was already recording your network traffic and then decides to steal your computer. I think the strongest argument for continuing to inject new entropy is that, as mentioned above, it's hard to reliably overwrite old seeds stored on disk or (especially) flash. A failure to erase the RNG seed can compromise every key-erasure mechanism elsewhere in the system.
Anyway, whether or not you want "backward security", you can and should use a fast-key-erasure RNG. All of the seeding/reseeding options I've mentioned are compatible with using a deterministic, well-tested fast-key-erasure RNG module. The module can simply start with key 0, and provide an inject(e,elen) function that replaces the fast-key-erasure key k with SHA-256(k,SHA-256(e,elen)) and clears the RNG output buffer. It's up to the caller to accumulate enough entropy for these inject calls, and in the "backward security" scenario to accumulate enough entropy in each inject call.
Assume that a fast-key-erasure RNG is securely seeded. What's the fastest attack we can come up with against the resulting random numbers? Where are the risks of better attacks?
Concretely, suppose a user reveals 736 gigabytes of RNG output, involving a chain of 2^{30} AES keys. Suppose 2^{30} users all do the same thing starting from independent uniform random seeds, and the attacker sees all 736 exabytes of data. Can the attacker tell that these are in fact RNG output instead of independent uniform random bytes?
Attack 1. The attacker records a tiny fraction of this data, namely the first bit from each 736-byte output block. This is only a fraction of an exabyte, a few million dollars in hard drives.
The attacker guesses a seed, and follows this seed for a chain of 2^{80} AES keys, recording the first bit from each 736-byte output block. There's an obstacle here, namely that doing so many computations serially will be infeasible for the foreseeable future, but let's ignore this problem for a moment and simply focus on how much information is visible from 2^{80} AES computations.
If a user's seed u matches any of the attacker's 2^{80} AES keys, then the user's subsequent outputs will match the subsequent outputs computed by the attacker. Checking, say, the next 256 bits (assuming the key isn't within 256 bits of the end) will see a match that won't happen by chance.
If u doesn't match any of the attacker's keys, it's still entirely possible that the user's next key v = (AES_{u}(0),AES_{u}(1)) will match one of the attacker's keys, and then the next 256 bits from the user will match the next 256 bits computed by the attacker. And so on for subsequent user keys.
The attacker can recognize all of these matches by looking up each substring of 256 consecutive bits in a database of substrings obtained from the users. There's another obstacle here, namely that this imposes massive communication costs, but let's ignore this problem too.
The bottom line is that each of the 2^{30} users has 2^{30} ways to bump into each of the 2^{80} attacker keys, for a total success probability approximately 2^{140}/2^{256}.
This analysis isn't exact. For example, the attacker could break two keys at once, and doesn't get double credit for this; also, almost all keys (all keys after the initial seeds) are obtained as pairs (B_{0},B_{1}) where B_{0} and B_{1} are distinct, so denominator 2^{256}−2^{128} would make more sense than 2^{256}. There could be bigger issues that I've missed. I haven't experimentally verified scaled-down versions of this attack; I would say that there's some risk that the attack is actually much less effective (failing for some reason I didn't think of), but very little risk that the attack is more effective.
Attack 2. This is better than Attack 1 because it drastically reduces the communication costs.
Instead of recording the first bit from each output block, the attacker records the first distinguished output block, meaning the first output block that begins with the three bytes (0,0,0). This is just 2^{36} blocks. Even better, the attacker stores just the last 256 bits of each distinguished block, just a few terabytes of data.
For each of the 2^{80} computed keys, if the corresponding block is distinguished, the attacker looks for the block in the user database. This is just 2^{56} lookups.
If a user seed u matches one of the computed keys, then the first distinguished user block will match the next distinguished computed block. Similarly, if a subsequent user key matches one of the computed keys, then the next distinguished user block will match the next distinguished computed block.
This attack is a few percent less effective than Attack 1, because some user keys aren't followed by any distinguished blocks, but overall the success probability should again be close to 2^{140}/2^{256}.
Attack 3. This is better than Attack 2 because it allows essentially any amount of parallelism. Instead of running one seed through a chain of 2^{80} keys, the attacker tries 2^{56} seeds, running each seed until the first distinguished block.
This is an example of the parallel rho method from 1997 van Oorschot–Wiener, although the application here is slightly different from the usual applications. The parallelism and low communication costs make Attack 3 feasible for serious attackers, and experimental verification by academics can easily go beyond 2^{50} keys.
Again the success probability is approximately 2^{140}/2^{256}. This is an extremely small chance of success, although it is 2^{60} times bigger than one might expect from an attacker trying 2^{80} AES-256 keys.
Unless I'm missing something big, the same attack would be practically guaranteed to succeed for AES-128. Remember, kids: Don't use 128-bit cipher keys!
More attacks. Are there better attacks than what I've described? There are at least four different aspects to this question:
Let's focus for a moment on the second question stated above, namely how much the attacker can benefit from knowing that the 128-bit AES output blocks are distinct.
There's a standard theorem called the "PRP-PRF switch" saying that the attacker's benefit is at most b(b−1)/2^{129}, where b is the number of AES blocks generated. Formally, you're supposed to analyze the success probability of an attack against an ideal secret-key function F that produces independent uniform random outputs for all inputs; then the PRP-PRF switch says that the probability increases by at most b(b−1)/2^{129} if you instead plug in an ideal secret-key permutation P, which is just like F except for magically guaranteeing that the outputs are distinct; and, finally, we don't know how to distinguish AES_{k} from P without seeing k.
But wait. The users together are generating 48*2^{60} AES blocks. If b = 48*2^{60} then b(b−1)/2^{129} is larger than 1, so the PRP-PRF switch doesn't say anything.
Is it possible to address this with the slightly-beyond-birthday-bound version of the PRP-PRF switch that I proved in 2005? Or can we use the fact that there are actually many different AES keys, with no guarantee of distinctness across keys? What exactly has been proven about the PRP-PRF switch in this multiple-key scenario? The fast-key-erasure RNG has a much tighter limit on the lifetime of each key than NIST's AES-CTR-DRBG does; does this produce quantitatively better security bounds?
These are good topics for provable-security papers, and for real-world attacks when people screw up the details badly enough. But let me point out a better approach.
Salsa20 has a 512-bit block. ChaCha20 has a 512-bit block. The explicit security goal of Salsa20 and ChaCha20 is for the outputs to be indistinguishable from uniform. There's no funky distinctness qualifier to worry about.
Internally, there's a permutation that always produces distinct 512-bit outputs from distinct 512-bit inputs. Checking for this distinctness is simply one of many failed attack strategies, with success probability so low as to barely be worth mentioning, whereas for AES-256-CTR the distinctness of 128-bit output blocks is the most powerful attack we know.
Of course, if we use ciphers with small block sizes, then it's easy to motivate papers proving something about the damage caused by this weakness. Maybe 128-bit blocks are too big for most cryptographers to really appreciate the danger, but NSA is currently trying to push standardization and deployment of Simon-64-128 and Speck-64-128, two ciphers with tiny 64-bit blocks and 128-bit keys. I was one of about 40 people sitting in a meeting where the speaker, NSA's Louis Wingers (one of the Simon and Speck authors), falsely claimed that counter mode is safe for 64-bit blocks, since counter mode doesn't have block collisions. NSA's continuing promotion of these dangerous ciphers includes perfect sentences to quote in the introductions of "provable security" papers studying small block sizes.
Part of the "provable security" culture is to then praise the resulting systems for having proofs. But avoiding the weakness in the first place is simpler and more robust. The system with more proofs—the system using a cipher with small blocks—is more fragile and harder to audit. The additional proofs are advertised as a sign of safety but are actually a sign of danger.
Proofs sometimes play a useful role in cryptography, and the rest of my blog post will look at some proofs in detail. But there are some important caveats here regarding "provable security":
The canonical starting point to learn more about these problems is the "Another look at provable security" series of papers by Koblitz and Menezes. Menezes's invited talk at Eurocrypt 2012 is a great introduction.
I expect the first three problems to eventually be fixed through computer verification, increased attention to "tightness", and increased attention to the accuracy of security models. But these are huge problems today.
I don't expect the fourth problem to go away (and I'm not sure about the fifth). There's far too much pressure for people to write papers aiming at the fundamental goal of "provable security", namely to prove that complete systems are as secure as primitives. It's straightforward to reach this goal by choosing sufficiently weak primitives, whereas it's difficult, perhaps impossible, to reach this goal in any other way.
With these caveats in mind, I'll now focus on the third question stated above: assuming the cipher outputs are indistinguishable from uniform, is there some better way to exploit the structure of the fast-key-erasure RNG?
Using some cipher output to generate a new key for the cipher (and not using that cipher output in any other way) is an ancient and very frequently used idea. The security intuition is straightforward: if the attacker can't distinguish the cipher output from uniform, then the attacker can't tell the difference between the actual situation and a situation where the new cipher key is generated independently at random.
One would think that there would have been a paper many years ago formalizing this intuition as an easy-to-use theorem and giving a simple, convincing proof. The security of the fast-key-erasure RNG would visibly be a special case of the theorem, and this would be the end of the story.
In fact, the literature on this topic is surprisingly large and surprisingly messy. The same rekeying idea appears under at least three names with separate proofs, as illustrated by the following papers:
The proofs are surprisingly long, given how simple the intuition is. Typically the proofs are buried in appendices; often they're only sketched. There are more papers with more proofs: e.g., my XSalsa20 paper includes a new theorem with a quantitative improvement. People trying to check proofs will obviously be overwhelmed, and it's not surprising that some errors have slipped through:
This isn't a complete survey of the literature, but adding more information will simply make auditors more worried. For example, a 2006 paper by Campagna sounds at first like it's proving security bounds for AES-CTR-DRBG, but a closer look shows that the paper is only studying AES-CTR and isn't actually proving anything about rekeying.
Do we really believe that all the errors have been eliminated at this point from theorems proving the security of rekeying? One correct application of one correct theorem should be enough, but why is the auditor supposed to believe any particular theorem?
Here are four specific issues that bother me about proofs in this area.
Monolithic handling of multiple levels of rekeying. An initial key is used to produce outputs, some of which are used as derived keys for a followup protocol. It seems intuitively clear that any attack has to find non-randomness in the outputs, or find a weakness in the followup protocol.
For example, the initial key k for the fast-key-erasure RNG is used to produce outputs (AES_{k}(0),...,AES_{k}(47)), and then (AES_{k}(0),AES_{k}(1)) are used as a derived key for a followup protocol, namely the same RNG applied recursively. It seems intuitively clear that any attack has to find non-randomness in (AES_{k}(0),...,AES_{k}(47)), or find a weakness in the followup use of (AES_{k}(0),AES_{k}(1)). But the proofs in the literature usually don't work this way: they consider the entire chain or tree of derived keys at once.
The main reason I wrote a new proof in my XSalsa20 paper was to follow the intuition more closely, first proving a theorem about one level of derived keys and then deducing a multi-level theorem by induction. This proof is also considerably shorter than the Bellare–Canetti–Krawczyk "cascade" proof, and I think this reflects a real simplification. But newer papers don't seem to have adopted this strategy.
Oversimplified cost metrics. The "fast" algorithms constructed in many "security proofs"—including the proofs in this area, including the proof in my XSalsa20 paper—are serial algorithms that build giant arrays of random numbers, queries, etc. This can end up dominating the cost of the attack. Maybe these costs can be reduced, as in the improvements from Attack 1 to Attack 2 and Attack 3, but maybe not.
There was a short "Notes on low-memory attacks" subsection in my XSalsa20 paper pointing out this issue. Regarding two arrays U and V of random numbers, I wrote "A standard way to eliminate the space for U and V is to replace random-number generation by pseudorandom-number generation." Apparently this is called "the random oracle technique" in a Crypto 2017 paper by Auerbach–Cash–Fersch–Kiltz, which highlights (and claims to introduce) the topic of "memory tightness" in reductions. I also had some ad-hoc suggestions for eliminating the space for an "array of query prefixes" used in my proof (and in previous proofs).
Non-constructive definitions. The traditional type of security definition considers the chance that a cost-limited algorithm A breaks a cryptographic system X. The problem with this type of definition is that it allows unrealistic attacks A that take a huge amount of time to find: i.e., attacks that allow a huge amount of precomputation. This is what led to the mistake mentioned above in the 2006 Bellare paper.
To exclude such attacks, Lange and I proposed instead considering the chance that a small cost-limited algorithm P prints a cost-limited algorithm A that breaks X. (See Appendix B.4 of "Non-uniform cracks in the concrete".) Any reduction theorem then has to be stated as a theorem about P, not merely a theorem about A. This is compatible with most of the proofs I've mentioned but excludes the proof in the 2006 Bellare paper.
Working with the wrong cipher security metric. Serious attack analysis always has to consider attacks against multiple targets: often there are multiple-key attacks more effective than attacking one key at a time.
From this perspective, it's weird to see theorems that make hypotheses about the security of one cipher key rather than hypotheses about the security of many independent cipher keys. It's similarly weird to see theorems drawing conclusions about the security of one RNG/cascade/NMAC/HMAC/... key instead of the security of many independent keys.
The Bellare–Canetti–Krawczyk "cascade" security proof actually does make a hypothesis about the security of many independent cipher keys, but it then draws a conclusion about the security of just one cascade key. In the talk accompanying my paper, I briefly mentioned that a multi-key hypothesis allowed a tight multi-key conclusion. But I didn't write down proof details at the time, and I didn't realize that focusing on multi-key security allowed a considerably simpler proof.
Let's define G(k) as the string (AES_{k}(0),AES_{k}(1)), and F(k) as the string (AES_{k}(2),AES_{k}(3),...,AES_{k}(47)). Let's focus on the first 1472 bytes of RNG output, F(k) and F(G(k)). Actually, let's generalize a bit to F(k) and H(G(k)), allowing (but not requiring) a different function H to be used after the first 736 bytes.
Say there's an attack A that's given the strings F(k_{1}),H(G(k_{1})); F(k_{2}),H(G(k_{2})); ... F(k_{U}),H(G(k_{U})) where k_{1},k_{2},...,k_{U} are independent uniform random keys. Can A distinguish these strings from uniform?
The A-distance from these strings to uniform is at most the sum of
The first distance is the same as the B-distance from (H(s_{1}),H(s_{2}),...,H(s_{U})) to uniform, where B(x_{1},x_{2},...,x_{U}) is defined as follows: randomly generate r_{1},r_{2},...,r_{U} and then run A(r_{1},x_{1},r_{2},x_{2},...,r_{U},x_{U}). This is the success probability of a U-key attack against H, with almost the same cost as A.
The second distance is the same as the C-distance from (F(k_{1}),G(k_{1}),F(k_{2}),G(k_{2}),...,F(k_{U}),G(k_{U})) to uniform, where C(x_{1},y_{1},x_{2},y_{2},...,x_{U},y_{U}) is defined as A(x_{1},H(y_{1}),x_{2},H(y_{2}),...,x_{U},H(y_{U})). This is the success probability of a U-key attack against the pair (F,G), again with almost the same cost as A.
Regarding constructivity, it's reasonable to assume that there's a small algorithm for H; then a small algorithm that quickly prints A is easily converted into a small algorithm that quickly prints B and C.
To summarize, the success chance of a U-key attack against F-and-then-H-keyed-by-G is at most the success chance of a U-key attack against (F,G) plus the success chance of a U-key attack against H. This makes perfect sense, since U keys for F-and-then-H-keyed-by-G involve exactly U keys for (F,G) and exactly U keys for H.
I said at the beginning that I was considering only 1472 bytes of RNG output; but the generalization to H actually allows any number of bytes of RNG output. Take, for example, H to be F-and-then-H_{2}-keyed-by-G (or more generally F_{2}-and-then-H_{2}-keyed-by-G_{2}) so that the RNG outputs F(k) and F(G(k)) and H_{2}(G(G(k))). The success chance of an attack against this RNG is (by the theorem) at most the sum of success chances of U-key attacks against (F,G) and H; this is (by the theorem again) at most the sum of success chances of U-key attacks against (F,G), (F,G), and H_{2}. Repeat for any desired maximum number of blocks. The generalization to H also handles the forward-security scenario: simply define H(k) as (F(k),G(k)).
As a concrete example, the success probability of an attack against 2^{30} users of the fast-key-erasure RNG, each generating a chain of 2^{30} keys, is at most the sum of 2^{30} chances of similarly efficient 2^{30}-key attacks distinguishing (AES_{k}(0),...,AES_{k}(47)) from uniform. The best 2^{30}-key attack we know against (AES_{k}(0),...,AES_{k}(47)) is to
Unless we can come up with a better attack against AES, we can't beat success probability approximately 1/2^{57} for an attack against the fast-key-erasure RNG. Replacing AES-256 with Salsa20 or ChaCha20 improves 1/2^{57} to 2^{140}/2^{256}.
Of course this should be stated as a formal theorem with clear definitions, and the proof should go through careful review: remember what I said about errors in proofs. But this feels like an easy textbook exercise. How can there be hundreds of pages of papers on this topic?
There's one noticeable way that the RNG situation is simpler than the more general rekeying situation analyzed in, e.g., the "cascade" paper and the NMAC/HMAC papers. But the same proof technique turns out to work in the more general situation too.
Let's say F is a function mapping a 256-bit key k and a "block" x to a 256-bit output F(k,x). For example, the RNG defines F(k,x) as (AES_{k}(x),AES_{k}(x+1)), and the set of blocks is {0,2,4,6,8,...,46}. Exactly one of these blocks, namely block 0, is used by the RNG to produce a derived key.
For generalized rekeying, the set of blocks can be much larger, including any number of blocks used to produce derived keys. This is where the RNG situation is simpler.
Let's write X for the set of blocks used to produce derived keys. For each x in X, the cipher output F(k,x) is used as a derived key for another function H with inputs in a set Y, and the outputs of H are given to the attacker upon request. For each block x that isn't in X, the cipher output F(k,x) is simply given to the attacker upon request. Formally, define T(k,x,y) = H(F(k,x),y) for x in X and y in Y; and define T(k,x) = F(k,x) for any block x outside X.
U users can have many more than U derived keys, since X can have many elements. It's convenient here to switch to a more general notation for describing multi-key attacks: instead of having a number U of users, let's have a set U of strings that label users. There are three targets for multi-key attacks:
Intuitively, there are exactly two ways to attack T: find a pattern in the F outputs, or find a pattern in the H outputs. The proof will follow exactly this intuition.
Here's the proof.
As above, choose an independent uniform random 256-bit key K(u) for each u in U, and choose an independent uniform random 256-bit key J(u,x) for each u in U and each x in X. Also define M as follows: choose an independent uniform random 256-bit string M(u,x) for each u in U and each block x outside X; and define M(u,x,y) = H(J(u,x),y) for each u in U, x in X, and y in Y.
Define B(V) as A(W), where W is defined as follows: W(u,x,y) = V(u,x,y) for each u in U, x in X, and y in Y; W(u,x) = M(u,x) for each u in U and each block x outside X. Then the B-distance from H_{J} to uniform is exactly the A-distance from M to uniform. Indeed:
Define C(V) as A(W), where W is defined as follows: W(u,x,y) = H(V(u,x),y) where each u in U, x in X, and y in Y; W(u,x) = V(u,x) for each u in U and each block x outside X. Then the C-distance from F_{K} to uniform is exactly the A-distance from T_{K} to M. Indeed:
[2017.07.26 update: Corrected "B" typos in these two bullet items, which should have said "C". Remember what I said about errors in proofs?]
The A-distance from T_{K} to uniform is at most the A-distance from M to uniform plus the A-distance from T_{K} to M; i.e., the B-distance from H_{J} to uniform plus the C-distance from F_{K} to uniform. These are the success probabilities of multi-key attacks against H and F respectively, each attack having almost the same cost as A.
To avoid the cost of building and accessing an array of M values created on demand inside B, replace M with output from a high-security cipher (assuming one exists); this has negligible impact on the success probability of the attack. As for constructivity, again assume that there's a small algorithm for H; then a small algorithm that quickly prints A is easily converted into a small algorithm that quickly prints B and C.
That's it.