cr.yp.to: 2015.11.20: Break a dozen secret keys, get a million more for free

Newer (Access-K): 2016.03.15: Thomas Jefferson and Apple versus the FBI: Can the government censor how-to books? What if some of the readers are criminals? What if the books can be understood by a computer? An introduction to freedom of speech for software publishers. #censorship #firstamendment #instructions #software #encryption

Older (Access-J): 2015.03.14: The death of optimizing compilers: Abstract of my tutorial at ETAPS 2015. #etaps #compilers #cpuevolution #hotspots #optimization #domainspecific #returnofthejedi

Table of contents (Access-I for index page)

2025.04.23: McEliece standardization: Looking at what's happening, and analyzing rationales. #nist #iso #deployment #performance #security

2025.01.18: As expensive as a plane flight: Looking at some claims that quantum computers won't work. #quantum #energy #variables #errors #rsa #secrecy

2024.10.28: The sins of the 90s: Questioning a puzzling claim about mass surveillance. #attackers #governments #corporations #surveillance #cryptowars

2024.08.03: Clang vs. Clang: You're making Clang angry. You wouldn't like Clang when it's angry. #compilers #optimization #bugs #timing #security #codescans

2024.06.12: Bibliography keys: It's as easy as [1], [2], [3]. #bibliographies #citations #bibtex #votemanipulation #paperwriting

2024.01.02: Double encryption: Analyzing the NSA/GCHQ arguments against hybrids. #nsa #quantification #risks #complexity #costs

2023.11.25: Another way to botch the security analysis of Kyber-512: Responding to a recent blog post. #nist #uncertainty #errorbars #quantification

2023.10.23: Reducing "gate" counts for Kyber-512: Two algorithm analyses, from first principles, contradicting NIST's calculation. #xor #popcount #gates #memory #clumping

2023.10.03: The inability to count correctly: Debunking NIST's calculation of the Kyber-512 security level. #nist #addition #multiplication #ntru #kyber #fiasco

2023.06.09: Turbo Boost: How to perpetuate security problems. #overclocking #performancehype #power #timing #hertzbleed #riskmanagement #environment

2022.08.05: NSA, NIST, and post-quantum cryptography: Announcing my second lawsuit against the U.S. government. #nsa #nist #des #dsa #dualec #sigintenablingproject #nistpqc #foia

2022.01.29: Plagiarism as a patent amplifier: Understanding the delayed rollout of post-quantum cryptography. #pqcrypto #patents #ntru #lpr #ding #peikert #newhope

2020.12.06: Optimizing for the wrong metric, part 1: Microsoft Word: Review of "An Efficiency Comparison of Document Preparation Systems Used in Academic Research and Development" by Knauff and Nejasmic. #latex #word #efficiency #metrics

2019.10.24: Why EdDSA held up better than ECDSA against Minerva: Cryptosystem designers successfully predicting, and protecting against, implementation failures. #ecdsa #eddsa #hnp #lwe #bleichenbacher #bkw

2019.04.30: An introduction to vectorization: Understanding one of the most important changes in the high-speed-software ecosystem. #vectorization #sse #avx #avx512 #antivectors

2017.11.05: Reconstructing ROCA: A case study of how quickly an attack can be developed from a limited disclosure. #infineon #roca #rsa

2017.10.17: Quantum algorithms to find collisions: Analysis of several algorithms for the collision problem, and for the related multi-target preimage problem. #collision #preimage #pqcrypto

2017.07.23: Fast-key-erasure random-number generators: An effort to clean up several messes simultaneously. #rng #forwardsecrecy #urandom #cascade #hmac #rekeying #proofs

2017.07.19: Benchmarking post-quantum cryptography: News regarding the SUPERCOP benchmarking system, and more recommendations to NIST. #benchmarking #supercop #nist #pqcrypto

2016.10.30: Some challenges in post-quantum standardization: My comments to NIST on the first draft of their call for submissions. #standardization #nist #pqcrypto

2016.06.07: The death of due process: A few notes on technology-fueled normalization of lynch mobs targeting both the accuser and the accused. #ethics #crime #punishment

2016.05.16: Security fraud in Europe's "Quantum Manifesto": How quantum cryptographers are stealing a quarter of a billion Euros from the European Commission. #qkd #quantumcrypto #quantummanifesto

2016.03.15: Thomas Jefferson and Apple versus the FBI: Can the government censor how-to books? What if some of the readers are criminals? What if the books can be understood by a computer? An introduction to freedom of speech for software publishers. #censorship #firstamendment #instructions #software #encryption

2015.11.20: Break a dozen secret keys, get a million more for free: Batch attacks are often much more cost-effective than single-target attacks. #batching #economics #keysizes #aes #ecc #rsa #dh #logjam

2015.03.14: The death of optimizing compilers: Abstract of my tutorial at ETAPS 2015. #etaps #compilers #cpuevolution #hotspots #optimization #domainspecific #returnofthejedi

2015.02.18: Follow-You Printing: How Equitrac's marketing department misrepresents and interferes with your work. #equitrac #followyouprinting #dilbert #officespaceprinter

2014.06.02: The Saber cluster: How we built a cluster capable of computing 3000000000000000000000 multiplications per year for just 50000 EUR. #nvidia #linux #howto

2014.05.17: Some small suggestions for the Intel instruction set: Low-cost changes to CPU architecture would make cryptography much safer and much faster. #constanttimecommitment #vmul53 #vcarry #pipelinedocumentation

2014.04.11: NIST's cryptographic standardization process: The first step towards improvement is to admit previous failures. #standardization #nist #des #dsa #dualec #nsa

2014.03.23: How to design an elliptic-curve signature system: There are many choices of elliptic-curve signature systems. The standard choice, ECDSA, is reasonable if you don't care about simplicity, speed, and security. #signatures #ecc #elgamal #schnorr #ecdsa #eddsa #ed25519

2014.02.13: A subfield-logarithm attack against ideal lattices: Computational algebraic number theory tackles lattice-based cryptography.

2014.02.05: Entropy Attacks! The conventional wisdom says that hash outputs can't be controlled; the conventional wisdom is simply wrong.

2015.11.20: Break a dozen secret keys, get a million more for free: Batch attacks are often much more cost-effective than single-target attacks. #batching #economics #keysizes #aes #ecc #rsa #dh #logjam

[An in-depth blog post by Daniel J. Bernstein, crossposted between the ECRYPT blog and the cr.yp.to blog.]

For many years NIST has officially claimed that AES-128 has "comparable strength" to 256-bit ECC, namely 128 "bits of security". Ten years ago, in a talk "Is 2²⁵⁵−19 big enough?", I disputed this claim. The underlying attack algorithms had already been known for years, and it's not hard to see their impact on key-size selection; but somehow NIST hadn't gotten the memo, and still hasn't gotten it today.

What the attacker hopes to find inside the AES attack is a key collision. This means that a key guessed by the attack matches a key chosen by a user. Any particular guessed key has chance only 1/2¹²⁸ of matching any particular user key, but the attack ends up merging costs across a batch of 2⁴⁰ user keys, amplifying the effectiveness of each guess by a factor 2⁴⁰. A paper by Chatterjee–Menezes–Sarkar presents many more attacks along these lines, including a quite serious 80-bit collision attack on a version of HMAC that's fully compliant with the NIST HMAC standard:

The single-key ECC attack also looks for key collisions, and on top of this exploits another attack tool called a "worst-case-to-average-case reduction", specifically a "random self-reduction". This extra attack tool expands the original user key (a curve point kP) into a pool of many modified user keys (points kP+rP, where r is chosen randomly by the attacker). Any collision between any of the guessed keys (points gP, where g is known to the attacker) and any of the modified user keys (i.e., any equation kP+rP = gP) immediately reveals the original user's secret key. Breaking 2⁴⁰ user keys in the same way needs 2⁴⁰ collisions, and this takes only 2²⁰ times as long as finding one collision.

The number of modified user keys is now chosen by the attacker, rather than being naturally limited to the number of legitimate user keys. This sounds much more destructive than the AES-128 attack, and it would be much more destructive if we used 128-bit ECC: the attacker would expand the original key into 2⁶⁴ modified keys and find a collision after just 2⁶⁴ guessed keys, for a total attack cost on the scale of 2⁶⁴. Rivest's "distinguished points" idea again eliminates issues regarding data volume, communication, parallelization, etc.; the classic paper on this topic is "Parallel collision search with cryptanalytic applications" by van Oorschot and Wiener.

In 2007 NIST revised its "Recommendation for Key Management" to prohibit use of 160-bit ECC after 2010: the minimum ECC key size allowed is 224 bits after 2010 (and 256 bits after 2030). Evidently NIST understands that a 2⁸⁰ attack breaking somebody's 160-bit ECC key is enough of a problem to warrant moving to larger ECC keys. But NIST is foolishly continuing to recommend AES-128 keys, even though a 2⁸⁰ attack will break somebody's AES-128 key out of a batch of 2⁴⁸ keys.

The security section of my Curve25519 paper includes a subsection "Batch discrete logarithms" that summarizes the cost of attacking multiple targets and concludes that Curve25519 is big enough: "The attacker is likely to wait for 2¹²⁵ additions before finding the first key ... Curve25519 is at a comfortable security level where finding the first key is, conjecturally, far out of reach, so the reduced cost of finding subsequent keys is not a threat. The attacker can perform only 2¹²⁵ε additions for small ε, so the attacker's chance of success—of finding any keys—is only about ε²." The paper continues by comparing this to the cost of "brute-force search" for finding "a batch of u keys" or "finding some key" in a secret-key system such as AES. Ten years later this type of analysis is still missing from NIST's latest key-size recommendations.

Cutting things too close: the "attacker economist" philosophy. There are three common approaches to choosing cryptographic key sizes:

For some attacks, notably the number-field sieve (NFS) for integer factorization, the success probability jumps very quickly from 0 to 1 as the cost increases past a particular threshold. It's then clear what the "attacker will probably fail" approach says, namely that the key size has to be chosen so that this threshold is beyond the attacker's resources.

However, for many other attacks the success probability grows much more gradually, and then it's not clear what the "attacker will probably fail" approach means. If "the cost of breaking" is defined as the median cost then the "attacker will probably fail" approach ends up saying that the attacker has chance below 50% of breaking the system; but cryptographic users aren't happy to hear that they've been given a cryptographic system that the attacker can break with probability 40%. NIST seems to focus on the average cost ("an algorithm that provides X bits of security would, on average, take 2^X-1 T units of time to attack"), and with this definition the attacker's success probability could be 60% or even larger.

The "attacker economist" approach provides a rationale for focusing on averages. It views the attacker as rationally deciding whether to carry out attacks, saying "I will carry out this attack because its average benefit to me is larger than its average cost" or "I will skip this attack because its average benefit to me is smaller than its average cost". Of course, if the attacker carries out an attack of the second type, the attacker might still end up luckily gaining something (maybe the benefit turns out to be unusually large or the cost turns out to be unusually small), but after a long series of such attacks the attacker expects to lose.

The attraction of the "attacker economist" approach is that it produces smaller key sizes. The attack cost is limited to min{B,C}, where B is the attacker's benefit and C is the attacker's resources; for comparison, in the "attacker will definitely fail" approach, the attack cost is limited to C, which is usually much larger than min{B,C}. In other words, the "attacker economist" approach excludes not merely infeasible attacks, but also uneconomical attacks.

There are, however, several problems with this approach. First, even if the attacker's expected cost obviously outweighs the attacker's expected benefit, sometimes the attacker goes ahead and carries out the attack anyway. The "attacker economist" approach assumes that rational attackers won't carry out uneconomical attacks, so users don't have to worry about such attacks; but attackers in the real world do not always act on the basis of rational economic analyses. The damage done by the attack, beyond its benefits to the attacker, is invisible to the "attacker economist" philosophy.

Second, evaluating the attacker's benefit requires understanding all the consequences of, e.g., successfully forging a signature. How exactly is the signature being used? What exactly does the attacker gain from the forgery? What is the monetary value of this benefit? The answers are complicated and constantly changing; they have a dramatic dependence on the details of how cryptography is integrated into real applications. A single forgery might not sound so bad, but it might also allow the attacker to take complete control over a critical computer. Sure, we can tell cryptographic users that they should evaluate the consequences of the cryptography failing, and that they should limit the attacker's gain from a successful attack, but the reality is that nobody gets this right: people are constantly surprised by how much damage an attack can do. For comparison, there's at least some hope of accurately assessing the resources available to attackers and of building cryptographic systems that are infeasible to break.

Third, even in situations where it's clear how much damage is caused by each forgery, "attacker economist" proponents are constantly overestimating the attacker's cost per forgery. The typical process here might sound reasonable at first: take what everybody says is the state-of-the-art forgery algorithm, and look at what everybody says is the cost of that algorithm. The problem is that the algorithm designers are practically always optimizing and reporting costs in a metric that's different from, and much larger than, the cost per forgery. Specifically, the usual metric is the cost of forging one message under one key, while the attacker is trying to forge many messages under many keys.

Some examples of difficulties for the "attacker economist" philosophy. People typically say that NFS is the state-of-the-art algorithm for forging RSA signatures, and that NFS has cost somewhere around 2⁸⁰ for RSA-1024. However, NFS produces much more than a single forgery: it produces the RSA secret key. The attacker then uses this key for any number of forgeries. The cost per forgery is the cost of the usual signing algorithm plus a small fraction of 2⁸⁰, depending on the number of forgeries. The cost of one forgery thus wildly overestimates the per-forgery cost of many forgeries under the same key.

As another example, consider the first draft of the UMAC RFC (October 2000), which claims that 64-bit UMAC tags are "appropriate for most security architectures and applications". This claim is backed by the following deceptive statement: "A 1/2⁶⁰ forging probability means that if an attacker could try out 2⁶⁰ MACs, then the attacker would probably get one right." The correct statement is that an attacker trying out 2⁶⁰ MACs would probably get at least 2⁵⁸ right. I had already pointed out this mistake as part of a paper (see also the latest version) posted several months earlier:

Several years later, after I objected, the UMAC authors tweaked their text as follows: "If an attacker could have N forgery attempts, then the attacker would have no more than N/2⁶⁰ probability of getting one or more of them right." The actual situation, when there are successful forgeries, is that the number of successful forgeries is very likely to be on the scale of N; describing this as "one or more" forgeries gives the reader a very wrong idea.

So, okay, instead of looking at the cost of one forgery, we should look at the cost of breaking one key, right? No: the cost of breaking one key often wildly overestimates the per-key cost of breaking many keys. I'll again take an ECC example: people typically say that breaking 160-bit ECC costs 2⁸⁰; but breaking a batch of a million 160-bit ECC keys costs a total of just 2⁹⁰, i.e., 2⁷⁰ per key.

Both of these mistakes—confusing the cost of one forgery with the cost per forgery, and confusing the cost of breaking one key with the cost per key—cause serious problems for the "attacker economist" approach.

The situation is completely different if key sizes are chosen so that forging one message under one key is infeasible. Forging more messages is certainly no easier than forging one message, and is therefore also infeasible; the reduced cost per forgery doesn't matter. Of course, one still has to be careful to check whether attacking a batch increases the success probability of feasible attacks, exactly the AES-128 issue described at the start of this blog post.

Can an attacker actually carry out 2⁸⁰ or 2⁹⁰ or 2¹⁰⁰ operations? Here are some back-of-the-envelope numbers that help put attack possibilities into perspective. Mass-market GPUs use state-of-the-art chip technology, are optimized for floating-point operations, and perform about 2⁵⁸ floating-point multiplications per watt-year. The number of floating-point multiplications that the attacker can carry out in a year with this technology is limited by the number of watts available:

Of course, the operations in a typical cryptanalytic attack are more complicated than floating-point multiplications. On the other hand, serious attackers can build special-purpose chips streamlined for their attacks, and computer technology continues to improve. There's no substitute for detailed analyses of actual attack costs, such as papers at the various SHARCS workshops.

Anyway, the overall scale of these numbers explains why cryptographers distinguish between 2⁸⁰ security and 2¹²⁸ security. 2⁸⁰ might sound like a big number, but it's not that big compared to the 2⁵⁵ nanoseconds in a year. Serious attackers today have enough computer equipment to be performing more than 2²⁵ useful operations each nanosecond, for a total of more than 2⁸⁰ useful operations in a year. On the other hand, this is still much, much, much smaller than 2¹²⁸.

Batch factorization. Suppose that someone ignores my advice and deploys a system providing "2⁸⁰ security". Probably the security claim is based on an attack algorithm that was optimized for breaking a single key. How much better can the attacker do by attacking a large batch of keys, say 2²⁰ keys?

Earlier I explained how well batch algorithms do for guessing secret cipher keys and for breaking ECC. Here are the results for 80-bit cipher keys and 160-bit ECC keys:

Attack cost	Average number of keys found out of 2²⁰ secret 80-bit cipher keys	Average number of keys found out of 2²⁰ secret 160-bit ECC keys
2⁵⁰	1/2¹⁰	1/2⁶⁰
2⁶⁰	1	1/2⁴⁰
2⁷⁰	2¹⁰	1/2²⁰
2⁸⁰	2²⁰	1
2⁹⁰	overkill	2²⁰
2¹⁰⁰	overkill	overkill

RSA is vulnerable to a sophisticated attack strategy called combining congruences. The basic idea of combining congruences isn't hard to explain, but there's a massive literature introducing an apparently neverending series of speedups to the basic idea and combining these speedups in all sorts of tricky ways. The state-of-the-art algorithms have so many adjustable knobs and hard-to-analyze traps that there's no consensus on how they will perform beyond the range of academic experiments.

If you want to know how big your RSA key needs to be for, say, 2¹²⁸ security, you might ask the summary site https://www.keylength.com/en/compare/, which collects recommendations from several sources. These recommendations range from 2048 bits to 6974 bits. What's the right answer?

I mentioned earlier that people typically report security around 2⁸⁰ for RSA-1024. What exactly does this 2⁸⁰ mean? Shamir and Tromer wrote in 2003 that "it appears possible to break a 1024-bit RSA key in one year using a device whose cost is about $10M (previous predictions were in the trillions of dollars)"; Franke, Kleinjung, Paar, Pelzl, Priplata, and Stahlke wrote in 2005 that there were "concerns about the feasibility" of the Shamir–Tromer design and presented a different design costing twenty times as much. What's the right answer?

The only real consensus is that the number of "operations" in NFS scales asymptotically as L^{1.901...+o(1)}, where

The difference between o(1) and 0 is important. The L^{1.901...+o(1)} formula can't be safely used to predict cost for any particular size of N, and can't be safely used to extrapolate from one size of N to another. If someone finds a new algorithm replacing 1.901... with, say, 1.8, then that algorithm might be worse for every size of N relevant to cryptography. However, there will be some size of N where the new algorithm is faster, and maybe that size is small enough to be relevant to cryptography, so obviously the performance of the new algorithm has to be explored in detail.

In a 1993 paper, Coppersmith introduced a variant of NFS that uses only L^{1.638...+o(1)} "operations" to factor N, after a precomputation that uses L^{2.006...+o(1)} "operations". The precomputation doesn't depend on the target N; it depends only on roughly how big N is. Coppersmith described the precomputation as the construction of a "factorization factory" that can then be applied to any target. If you have a batch of (say) L^0.5+o(1) targets N, then the factory-construction cost is negligible, and the total number of "operations" in Coppersmith's algorithm is L^{1.638...+o(1)} per target.

I've been writing "operations" in quotes here because this is counting random access x[i]=j as a single "operation", no matter how large the array x is: bigger than cache, bigger than DRAM, bigger than your 2TB disk, etc. This metric does a poor job of predicting real-world algorithm performance. Any attacker who can afford a gigantic storage array will also invest in many parallel processing elements (such as GPUs), and will find that parallel computations are much less expensive than random access to storage.

In a 2001 paper, I reoptimized NFS "sieving" and "linear algebra" for a realistic circuit model of computation, and concluded that the cost of single-target NFS actually scales as L^{1.976...+o(1)}. In a 2014 paper, Lange and I reoptimized Coppersmith's ideas for a realistic circuit model of computation, and concluded that the cost per integer of factoring L^0.5+o(1) integers actually scales as L^{1.704...+o(1)}. It turns out that Coppersmith's data flow, first precomputing a "factory" and then applying the "factory" to one target N at a time, isn't realistic: our new "batch NFS" algorithm scraps Coppersmith's "factory" and handles a large batch of targets simultaneously.

Let me again emphasize that these exponents don't say anything about any particular size of N. What's clear is merely that batching reduces cost for sufficiently large N, i.e., all N past some cutoff; it isn't clear what the cutoff is, and it isn't clear what the actual cost reduction is for any specific size. Lange and I have started digging into the more subtle question of what this means for the huge pile of 1024-bit keys that attackers have been collecting.

More examples of difficulties for the "attacker economist" philosophy. "It makes no sense for an adversary to spend (say) $10 million breaking a key if recovering the key will only net (say) $10 thousand," Silverman wrote in 2000. "There are many applications for which 768 bits is more than adequate."

Silverman estimated that spending $10 million on 4300 "500-MIPS machines" with "4.4 Gbytes per machine" would let the attacker factor a 768-bit key in "about 670 months", i.e., 56 years, i.e., 2^71.7 instructions. Silverman predicted "a date of 2019 for breaking such a key by public effort."

There's one important point that Silverman got right: accessing large volumes of memory is expensive. But what does this mean for an algorithm that was optimized for the number of "operations", counting (expensive) memory access as one "operation" and also counting (cheap) arithmetic as one "operation"? Silverman emphasized the expense of the memory access, apparently not realizing the possibility of reoptimizing algorithms for actual cost. Silverman also got many other important details wrong.

The factorization of the RSA-768 challenge was in fact publicly announced in 2009, ten years earlier than Silverman's "2019" prediction. The paper reported that the factorization had taken two years on "many hundreds of machines"; that it would have taken about 1500 core-years on 2.2GHz Opterons with 2GB of RAM, i.e., 2^66.5 cycles; and that "it is not unreasonable to expect that 1024-bit RSA moduli can be factored well within the next decade by an academic effort such as ours".

Of course, any serious attacker can be presumed to have built efficient special-purpose hardware tuned for integer factorization, while an "academic effort" does not have this luxury. Why does Silverman define security by the costs of a "public effort", if this is overstating the costs incurred by the actual attackers?

Furthermore, if 768 bits are beyond the cutoff for batch NFS, then the attacker can factor many 768-bit keys more cost-effectively than factoring just one. As I said before, it isn't known yet what the cutoff actually is, so one can't justifiably claim that there's a cost advantage, but one also can't justifiably claim that there isn't a cost advantage. As Lange and I wrote in our paper on batch NFS:

Why was Silverman comparing the costs and benefits of "breaking a key" if the actual job facing the attacker is to break many keys? There's an implicit assumption that each key has to be attacked separately; batch algorithms challenge this assumption.

Now let's move up from 768-bit keys to 1024-bit keys. Silverman claimed that it would require "at least 27 years to reach a point where 1024-bit keys are vulnerable to a public effort" and specifically predicted a date of "around 2037" for factoring a 1024-bit key. Since the time from Silverman's paper to the public RSA-768 factorization was only half of what Silverman had predicted, and the people announcing that factorization had said that it was "not unreasonable to expect" a public RSA-1024 factorization before 2020, one would think that anyone quoting Silverman's 1024-bit predictions would have some serious explaining to do. But Silverman's "around 2037" was repeated—even bumped to "around 2040"—in a 2011 paper by Grigg and Gutmann:

Grigg and Gutmann don't mention Silverman's "public effort" hypothesis; don't discuss the advantages of special-purpose hardware; don't discuss the problems with Silverman's extrapolation methodology; don't mention Silverman's highly inaccurate 768-bit extrapolation; and don't mention that Silverman's 1024-bit extrapolation has been disputed. But what I find particularly interesting in the Grigg–Gutmann quote is the leap from one key to many keys, ignoring the possibility of more cost-effective batch attacks. Even if it's true that factoring a 1024-bit key is a "massive effort" that won't happen before 2040, there's no justification for the claim that factoring N keys will require N times the effort.

To summarize, here's what's actually known about the per-key cost of breaking RSA keys: "It's complicated."

Batch non-elliptic discrete logs. What's even more complicated is the cost of solving the classic discrete-logarithm problem. I'm not talking about elliptic curves here: I'm talking about discrete logs in the multiplicative group of a finite field. I'll focus on the simple case of a prime field, exactly the problem of finding a secret key given a public key in the original DH system.

In the literature on combining congruences, there is a long history of speedup ideas being adapted from factorization algorithms to "index calculus" discrete-log algorithms (and occasionally vice versa). NFS is no exception. A 1992 paper by Gordon explained how to use NFS to compute discrete logs mod a prime in L^{2.080...+o(1)} "operations"; a 1993 paper by Schirokauer reduced 2.080... to 1.922...; a 2003 paper by Matyukhin reduced 1.922... to 1.901..., the same exponent as factorization.

Typical index-calculus algorithms have two stages. The first stage takes the prime as input and computes discrete logs of many small numbers. The second stage takes a target as an extra input and computes the discrete log of that target. The usual picture before NFS was that the second stage was much faster than the first stage. For NFS this picture was restored in a 2003 paper by Joux and Lercier. A 2006 paper by Commeine and Semaev improved and analyzed the Joux–Lercier idea, obtaining exponent 1.901... for the prime and exponent just 1.442... for each target. If the attacker is targeting a batch of L^0.5+o(1) discrete logs modulo the same prime then the cost of each discrete log with the Commeine–Semaev algorithm is just L^{1.442...+o(1)}, much faster than factorization.

Furthermore, the first stage handles the prime p very similarly to how NFS factors N, and it's easy to see that Coppersmith's precomputation ideas are also applicable here. Barbulescu's thesis in 2013 presented a discrete-log algorithm using the following number of "operations":

The techniques in my 2014 "Batch NFS" paper with Lange, replacing "operations" with a realistic cost metric, should be combinable with the batch version of this algorithm. In other words, computing a batch of many discrete logs mod many primes should cost much less than handling each discrete log separately. Of course, as in the case of integer factorization, figuring out the costs for (say) 1024-bit primes is a harder problem than figuring out the asymptotic cost exponents.

The Logjam attack. A paper this year from Adrian, Bhargavan, Durumeric, Gaudry, Green, Halderman, Heninger, Springall, Thomé, Valenta, VanderSloot, Wustrow, Zanella-Béguelin, and Zimmermann reports an attack called "Logjam" allowing man-in-the-middle attackers to "compromise connections to over 7% of Top Million HTTPS sites". The attack combines several unnecessary features of HTTPS:

The attacker now computes the server's 512-bit discrete log, and easily fools the client into thinking that the attacker is the server.

Obstacle: If computing a 512-bit discrete log takes too long, the client will give up on the connection, and the attack will fail. Grigg and Gutmann say that breaking a "dinky 512-bit key" will take "months" or "weeks"; maybe they were talking about RSA, but everyone says that 512-bit discrete-log keys are at least as hard to break as 512-bit RSA keys, right?

Solution: The Logjam paper reports 512-bit discrete-log computations "in about a minute", and it's clear that this can be improved with more hardware. "An adversary who performs a large precomputation for a prime p can then quickly calculate arbitrary discrete logs in that group, amortizing the cost over all targets that share this parameter," the paper says. "Although this fact is well known among mathematical cryptographers, it seems to have been lost among practitioners deploying cryptosystems."

There are clearly many different ways that the attack could have been prevented. I'll focus on three responses that involve changing the cryptographic primitives:

From a security perspective, the first response misses the point. Yes, there's a batch attack against multiple discrete logs for the same prime, but there are also batch attacks against many other cryptographic systems. What we actually need to know is whether attacks are feasible; there's no substitute for quantitative analysis of attacks against particular systems. The superficial statement "primes are shared" is a poor predictor of "batch attacks exist", and "batch attacks exist" is a poor predictor of "attacks are feasible".

The second response sounds much more reasonable from a security perspective, but security ends up being compromised by performance problems. The Logjam paper found that 84% of servers in 2015 used a "1024-bit or smaller group", despite NIST's recommendation to move up to 2048-bit groups by 2010; if it weren't for performance problems then nobody would ever have been using 1024-bit groups in the first place.

The third response produces a reasonable level of security: ECDHE, as actually implemented, doesn't allow any small groups. Properly implementing ECDHE with NIST P-256 isn't easy, but all of the critical pitfalls here are eliminated by next-generation ECC. The performance of NIST P-256 is already acceptable for many sites, and I expect the performance of next-generation ECC to be acceptable for almost all sites.

One can mix and match responses: for example, use ECDHE with a server-specific elliptic curve. But it's better to use ECDHE with a larger shared curve. Compared to a server-specific curve, a larger shared curve saves bandwidth; in most situations ends up saving CPU time; and has larger benefits against all known attacks.

Are we accurately simulating the attacker? Experts have publicly proposed and analyzed a huge number of ideas for attacking the problem of computing an elliptic-curve discrete logarithm. For most elliptic curves these ideas have produced only marginal improvements: the best attacks publicly known today against "conservative" elliptic curves (NIST P-256, Curve25519, etc.) aren't much better than the best attacks that were publicly known at the dawn of ECC thirty years ago. Is it safe to conclude that no much better algorithm exists, and in particular that the attacker doesn't know a much better algorithm?

Maybe, maybe not. Maybe the public search for a better algorithm hasn't been sufficiently comprehensive, or it happened to wander off in the wrong directions. Maybe there's a much better algorithm lurking just beyond the public search, an algorithm that required too big a leap for anyone to find. Maybe the attacker has hired smarter people, or searched longer, or simply gotten lucky.

The same risk is even more obvious for some other famous problems. For example, the problem of factoring a given 1024-bit RSA modulus is an amazingly productive research topic with a long and continuing history of public improvements. Very few experts are willing to point to a factorization algorithm and say "I bet this factorization algorithm is within a factor 2 of the best possible".

Of course, we try hard to minimize the risk for cryptographic users. We focus efforts on a small number of cryptanalytic problems, and we advertise these problems to a huge community of algorithm designers. We aim for overkill in studying every conceivable solution to these problems. We track how thoroughly these problems have actually been studied, and we track the remaining prospects for improvement. We direct users towards the problems with the strongest track records.

But what happens if the algorithmic problems facing the attackers aren't actually the algorithmic problems we're studying? In particular, what happens if the attack problems are easier than the problems we're studying?

Maybe we're starting from the wrong cost metric for algorithms. For example, maybe we're using the oversimplified "operations" cost metric that I mentioned before, counting random access to a large array as a single "operation" when in fact it's much more expensive than an arithmetic operation. (Sometimes people say that counting "operations" is an underestimate of actual cost and is therefore safe; but this is part of what went wrong with the Silverman predictions.) Maybe we're optimizing algorithms for mass-market easy-to-program Intel CPUs, while the attacker is optimizing for special-purpose chips with much more parallelism and with fancy networks on chip. Maybe the attacker has a quantum computer, making some operations feasible while we normally think of those operations as being insanely expensive.

Or maybe we're setting our cost limits too low. Maybe we're studying the problem of setting public factorization records using academic computer clusters, while the attacker is studying the problem of factoring 1024-bit keys as quickly as possible using billion-dollar clusters.

Or maybe, in an effort to advertise attractively simple problems, we've oversimplified the problem of attacking the actual cryptosystem. Maybe we're studying the problem of recovering a signature key while the attacker is studying the actual problem of forging signatures. Maybe we're studying the problem of attacking one key while the attacker is studying the actual problem of attacking a large batch of keys. This oversimplification is surprisingly common, and is exactly the reason for this blog post.

The cr.yp.to blog

2015.11.20: Break a dozen secret keys, get a million more for free: Batch attacks are often much more cost-effective than single-target attacks. #batching #economics #keysizes #aes #ecc #rsa #dh #logjam