cr.yp.to: 2019.10.24: Why EdDSA held up better than ECDSA against Minerva

Newer (Access-K): 2020.12.06: Optimizing for the wrong metric, part 1: Microsoft Word: Review of "An Efficiency Comparison of Document Preparation Systems Used in Academic Research and Development" by Knauff and Nejasmic. #latex #word #efficiency #metrics

Older (Access-J): 2019.04.30: An introduction to vectorization: Understanding one of the most important changes in the high-speed-software ecosystem. #vectorization #sse #avx #avx512 #antivectors

Table of contents (Access-I for index page)

2025.04.23: McEliece standardization: Looking at what's happening, and analyzing rationales. #nist #iso #deployment #performance #security

2025.01.18: As expensive as a plane flight: Looking at some claims that quantum computers won't work. #quantum #energy #variables #errors #rsa #secrecy

2024.10.28: The sins of the 90s: Questioning a puzzling claim about mass surveillance. #attackers #governments #corporations #surveillance #cryptowars

2024.08.03: Clang vs. Clang: You're making Clang angry. You wouldn't like Clang when it's angry. #compilers #optimization #bugs #timing #security #codescans

2024.06.12: Bibliography keys: It's as easy as [1], [2], [3]. #bibliographies #citations #bibtex #votemanipulation #paperwriting

2024.01.02: Double encryption: Analyzing the NSA/GCHQ arguments against hybrids. #nsa #quantification #risks #complexity #costs

2023.11.25: Another way to botch the security analysis of Kyber-512: Responding to a recent blog post. #nist #uncertainty #errorbars #quantification

2023.10.23: Reducing "gate" counts for Kyber-512: Two algorithm analyses, from first principles, contradicting NIST's calculation. #xor #popcount #gates #memory #clumping

2023.10.03: The inability to count correctly: Debunking NIST's calculation of the Kyber-512 security level. #nist #addition #multiplication #ntru #kyber #fiasco

2023.06.09: Turbo Boost: How to perpetuate security problems. #overclocking #performancehype #power #timing #hertzbleed #riskmanagement #environment

2022.08.05: NSA, NIST, and post-quantum cryptography: Announcing my second lawsuit against the U.S. government. #nsa #nist #des #dsa #dualec #sigintenablingproject #nistpqc #foia

2022.01.29: Plagiarism as a patent amplifier: Understanding the delayed rollout of post-quantum cryptography. #pqcrypto #patents #ntru #lpr #ding #peikert #newhope

2020.12.06: Optimizing for the wrong metric, part 1: Microsoft Word: Review of "An Efficiency Comparison of Document Preparation Systems Used in Academic Research and Development" by Knauff and Nejasmic. #latex #word #efficiency #metrics

2019.10.24: Why EdDSA held up better than ECDSA against Minerva: Cryptosystem designers successfully predicting, and protecting against, implementation failures. #ecdsa #eddsa #hnp #lwe #bleichenbacher #bkw

2019.04.30: An introduction to vectorization: Understanding one of the most important changes in the high-speed-software ecosystem. #vectorization #sse #avx #avx512 #antivectors

2017.11.05: Reconstructing ROCA: A case study of how quickly an attack can be developed from a limited disclosure. #infineon #roca #rsa

2017.10.17: Quantum algorithms to find collisions: Analysis of several algorithms for the collision problem, and for the related multi-target preimage problem. #collision #preimage #pqcrypto

2017.07.23: Fast-key-erasure random-number generators: An effort to clean up several messes simultaneously. #rng #forwardsecrecy #urandom #cascade #hmac #rekeying #proofs

2017.07.19: Benchmarking post-quantum cryptography: News regarding the SUPERCOP benchmarking system, and more recommendations to NIST. #benchmarking #supercop #nist #pqcrypto

2016.10.30: Some challenges in post-quantum standardization: My comments to NIST on the first draft of their call for submissions. #standardization #nist #pqcrypto

2016.06.07: The death of due process: A few notes on technology-fueled normalization of lynch mobs targeting both the accuser and the accused. #ethics #crime #punishment

2016.05.16: Security fraud in Europe's "Quantum Manifesto": How quantum cryptographers are stealing a quarter of a billion Euros from the European Commission. #qkd #quantumcrypto #quantummanifesto

2016.03.15: Thomas Jefferson and Apple versus the FBI: Can the government censor how-to books? What if some of the readers are criminals? What if the books can be understood by a computer? An introduction to freedom of speech for software publishers. #censorship #firstamendment #instructions #software #encryption

2015.11.20: Break a dozen secret keys, get a million more for free: Batch attacks are often much more cost-effective than single-target attacks. #batching #economics #keysizes #aes #ecc #rsa #dh #logjam

2015.03.14: The death of optimizing compilers: Abstract of my tutorial at ETAPS 2015. #etaps #compilers #cpuevolution #hotspots #optimization #domainspecific #returnofthejedi

2015.02.18: Follow-You Printing: How Equitrac's marketing department misrepresents and interferes with your work. #equitrac #followyouprinting #dilbert #officespaceprinter

2014.06.02: The Saber cluster: How we built a cluster capable of computing 3000000000000000000000 multiplications per year for just 50000 EUR. #nvidia #linux #howto

2014.05.17: Some small suggestions for the Intel instruction set: Low-cost changes to CPU architecture would make cryptography much safer and much faster. #constanttimecommitment #vmul53 #vcarry #pipelinedocumentation

2014.04.11: NIST's cryptographic standardization process: The first step towards improvement is to admit previous failures. #standardization #nist #des #dsa #dualec #nsa

2014.03.23: How to design an elliptic-curve signature system: There are many choices of elliptic-curve signature systems. The standard choice, ECDSA, is reasonable if you don't care about simplicity, speed, and security. #signatures #ecc #elgamal #schnorr #ecdsa #eddsa #ed25519

2014.02.13: A subfield-logarithm attack against ideal lattices: Computational algebraic number theory tackles lattice-based cryptography.

2014.02.05: Entropy Attacks! The conventional wisdom says that hash outputs can't be controlled; the conventional wisdom is simply wrong.

2019.10.24: Why EdDSA held up better than ECDSA against Minerva: Cryptosystem designers successfully predicting, and protecting against, implementation failures. #ecdsa #eddsa #hnp #lwe #bleichenbacher #bkw

The Minerva research team's web page claims "practical recovery of the long-term private key" from various "implementations of ECDSA/EdDSA in programmable smart cards and cryptographic software libraries". Concretely:

Wait a minute. If this was supposed to be an "ECDSA/EdDSA" attack, a shared "problem in the ECDSA and EdDSA algorithms", then why do these examples say just "ECDSA" and "ECDSA" and "ECDSA"?

Here's one theory you might come up with: "You're being selective in your summary. I looked through the Minerva page the day it was originally announced, and it also claimed to break the EdDSA implementation in libgcrypt."

Facts: The team did make this claim, but withdrew the claim two days later, for reasons that I'll explain below. I'm not being selective in my summary.

Here's another theory you might come up with: "Fundamentally, the attack applies equally to ECDSA and EdDSA. The only reason that there are more ECDSA attack reports is that ECDSA is more widely supported."

Facts: I looked at MatrixSSL, JDK, Crypto++, and wolfSSL/wolfCrypt. I noticed EdDSA (specifically Ed25519) implementations in everything except JDK. The Minerva team does not claim to break the EdDSA implementations in libgcrypt, MatrixSSL, Crypto++, and wolfSSL/wolfCrypt.

The Minerva web page also says that OpenSSL, BouncyCastle, BoringSSL, libtomcrypt, Botan, Microsoft CNG, mbedTLS, Intel IPP-Crypto, and 11 cards were "tested and found not vulnerable". I noticed EdDSA implementations in OpenSSL, BouncyCastle, BoringSSL, libtomcrypt, and Botan. In total, the Minerva team is claiming to break ECDSA in 4 or possibly 5 out of 13 crypto libraries supporting ECDSA, and EdDSA in 0 out of 9 crypto libraries supporting EdDSA. The bottom line is that the attack doesn't apply equally to ECDSA and EdDSA.

I'm not saying that it's impossible to implement EdDSA insecurely. Of course it's possible to implement EdDSA insecurely. Maybe there are people using insecure EdDSA implementations today. But the Minerva data points support the theory that EdDSA implementations are less likely to be broken than ECDSA implementations. This isn't an accident: it's how EdDSA was designed in the first place.

In the rest of this blog post, I'll take a closer look at the information exploited by the Minerva attack, and at the EdDSA implementations mentioned above. Thanks to Tanja Lange for help with this blog post.

Discrete logarithms. In ElGamal's signature system, the signer's public key is g^x mod p. Here p is a standard prime, g is a standard generator mod p, and x is the signer's long-term signing key, chosen uniformly at random from {0,1,...,p−2}.

Computing (g^x)^(p−1)/2 mod p reveals x mod 2: it's 1 if x is even, or p−1 if x is odd. More generally, for any divisor q of p−1, one can check a guess for x mod q as follows: x is congruent to (say) 7 modulo q if and only if (g^x)^(p−1)/q = (g⁷)^(p−1)/q. This isn't helpful when q is large: any particular guess for x mod q is very unlikely to be correct, and ruling out one guess still leaves many more possibilities.

The Pohlig–Hellman discrete-logarithm algorithm is faster, finding x in only about sqrt(q) operations where q is the largest prime dividing p−1, but let's assume that all prime divisors of (p−1)/2 are large. There are also discrete-logarithm algorithms that take time subexponential in the number of bits of p, but let's assume that p is so large that this isn't a threat. Shor's quantum discrete-logarithm algorithm takes polynomial time, but I'll assume throughout this blog post that the attacker doesn't have a quantum computer. It then seems very difficult to learn anything about x mod (p−1)/2 from g^x mod p.

Linear equations in discrete logarithms. The first step in signing a message in ElGamal's system is to choose a "random number k, uniformly between 0 and p−1, such that gcd(k,p−1) = 1." The gcd condition forces k to be odd. We're assuming that all prime divisors of (p−1)/2 are large, so the gcd condition is very unlikely to have any other effect.

ElGamal's signatures reveal the coefficients M,R,S of a linear equation M=xR+kS modulo p−1. This is equivalent to an equation M=xR+kS modulo each of the prime-power divisors of p−1. In particular, M=xR+kS=xR+S modulo 2, revealing x mod 2 if R is odd; but we already knew x mod 2, as explained above.

The rest of the equation is M=xR+kS modulo (p−1)/2, or equivalently M=xR+kS modulo q for each prime-power divisor q of (p−1)/2. This doesn't seem helpful for the attacker, since k mod q could be almost anything. For each prime q dividing (p−1)/2, the attacker knows that k isn't 0 modulo q, i.e., that M isn't xR modulo q, which (assuming R is invertible modulo q) is the same as ruling out M/R as a possibility for x mod q; but, again, ruling out one guess for x mod q still leaves many more possibilities.

Another signature from the same signer reveals another equation M₂=xR₂+k₂S₂. ElGamal emphasizes that the signer must choose a new k₂ for M₂, rather than taking k₂=k, since taking k₂=k allows the attacker to solve these two linear equations for x. Repeating k is what destroyed the security of the Sony PS3 many years later.

Taking a new k₂ seems to eliminate this problem: each possible x is compatible with some solution (k,k₂) to the two equations, except for the invertibility issue described above. More generally, ElGamal concludes that these equations for any number of signatures are compatible with a huge number of possibilities for x as long as "each signature uses a different k".

Schnorr's use of a prime-order subgroup. There's a lot to be said about improvements from ElGamal's system to the Schnorr system and EdDSA. One of the improvements, simplifying the security analysis, is to take g as a generator of a subgroup of prime order q rather than as a generator of the entire group of order p−1.

Each Schnorr signature reveals the coefficients M,R,S of a linear equation M=xR+kS modulo q. Almost certainly S is nonzero modulo q, so let's simplify this linear equation to Y=xX+k modulo q, where Y=M/S and X=R/S. Schnorr actually obtains X and Y in a simpler way, eliminating all divisions and in particular eliminating the need to require gcd{k,q}=1. Schnorr takes k as a random number in the set {1,...,q}.

How NSA biased the linear equations in DSA. NSA's original DSA proposal uses prime-order subgroups as in Schnorr's system, revealing an equation Y=xX+k modulo q for each signature. Other DSA details are closer to ElGamal's system, in particular using divisions by k modulo q, so k is specified to be in the set {1,...,q−1}. The same comments apply to the subsequent NIST standards for DSA and ECDSA.

NSA's proposal included a specific prime q, namely 0xb20db0b101df0c6624fc1392ba55f77d577481e5, about 0.7 times 2¹⁶⁰; and a specific prime p, which I won't repeat here, and which is around 0.8 times 2⁵¹². A subsequent update included another prime q, namely 0xc773218c737ec8ee993b4f2ded30f48edace915f, about 0.8 times 2¹⁶⁰; and another 512-bit prime p. All of these primes were generated by hashing in a way that seems to make it very difficult to target particular choices of p and q. (As a side note, hashing was also used to generate shared parameters for various other cryptosystems, but was avoided in NSA's subsequent Dual EC DRBG proposal.) All of these primes are much too small to resist discrete-log attacks, but the problems I'm about to describe also show up for larger DSA parameters.

How do we generate a uniform random element k of {1,...,q−1}? If q is, say, 2¹⁶⁰+1 then this is easy: generate 160 random bits, interpret the result as an integer, and add 1. But 2¹⁶⁰+1 isn't prime, and, more to the point, isn't either of the random-looking primes q mentioned above.

The literature has many good answers to this question, and then it has NSA's answer, which I'll get to in a moment. One good answer, rejection sampling, is to generate a uniform random 160-bit integer (let's assume q is between 2¹⁵⁹ and 2¹⁶⁰) and start over if the integer is outside {1,...,q−1}. Another good answer, with more predictable performance, is to generate a uniform random 320-bit integer and reduce that integer modulo q. The resulting distribution isn't exactly uniform but it's so close that you'll never be able to tell the difference. Technically, this doesn't guarantee that k is in {1,...,q−1}, but you'd have to be amazingly unlucky for your 320-bit integer to be a multiple of q.

NSA's answer is to take a much shorter integer, a 160-bit (not 320-bit) hash of some new secret, and reduce the hash modulo q. Presumably the 160-bit hash is hard to distinguish from a uniform random element of {0,...,2¹⁶⁰−1}, but the hash modulo q looks quite different from a uniform random element of {1,...,q−1}. For example, if q is NSA's 0xb20db0b101df0c6624fc1392ba55f77d577481e5, then any integer up through 0x4df24f4efe20f399db03ec6d45aa0882a88b7e1a is hit by two elements of {0,...,2¹⁶⁰−1}, while any larger integer is hit by only one element of {0,...,2¹⁶⁰−1}.

Interlude: Understanding the importance of distributions for security. Attack algorithms often work better for some inputs than for others. For a proper security analysis one has to be clear about the distribution of inputs. Changing the distribution can increase or decrease security levels, and can easily make the difference between a broken system and an unbroken system.

Here's a non-DSA example. How difficult is it, by known attacks, to find all of the prime factors of an integer N in {1,...,2²⁰⁴⁸}? This question is incomplete, because the distribution of inputs N is unspecified. Here are four possibilities for the distribution:

These examples show how unsafe it can be to assume that different distributions have the same security level. Comparing the first, second, and third cases shows that reducing randomness can improve security. Comparing the second and third cases to the fourth case shows that reducing randomness can damage security.

Note that these examples rely critically on understanding which inputs are broken by various factorization algorithms. Don't believe people who try to replace serious algorithm analysis with some superficial rule that more randomness is always better. See also Section 11.1 of 2008 Koblitz–Koblitz–Menezes for more subtle examples.

Security analysis of DSA biases, part 1: lattice attacks. Back to DSA. The security of a single signature, and in particular the amount of information revealed by the equation Y=xX+k modulo q, can't be affected much by NSA's change in the distribution of k. An attack that works with a particular probability against NSA's uniform random 160-bit k also has at least half as much probability of working against ElGamal's uniform random nonzero k modulo q. The point is that NSA produces any particular integer modulo q with probability at most 1/2¹⁵⁹ (assuming q is at least 2¹⁵⁹), while ElGamal produces the same integer with probability 1/(q−1), which is at least 1/2¹⁶⁰ (and even somewhat larger than this for the choices of q mentioned above). Whichever NSA choices of k are affected by the attack have at least half as much chance of appearing in ElGamal's choices of k. (This is an example of a "divergence argument".)

The same argument also says that NSA can't lose more than 2 bits of security for 2 signatures, more than 3 bits of security for 3 signatures, etc. But this is a useless guarantee for a signer generating, say, 1000 signatures.

ElGamal went beyond a single equation and considered the whole system of equations revealed by the signer: Y₁=xX₁+k₁ modulo q, Y₂=xX₂+k₂ modulo q, Y₃=xX₃+k₃ modulo q, etc. Any particular k from NSA has nearly as much chance of having been generated by ElGamal, but the whole sequence (k₁,k₂,k₃,...) from NSA is much less likely to be generated by ElGamal. (The "divergence" grows rapidly with the number of signatures.) What effect does this gap have on security?

What we know about NSA's (k₁,k₂,k₃,...) is that each k_j favors elements of {0,...,2¹⁶⁰−1−q} over elements of {2¹⁶⁰−q,...,2¹⁶⁰−1}. Let's consider a simpler and larger bias, where each k_j is guaranteed to be in a smaller range {0,...,2^b−1}. The attacker knows that Y_j=xX_j+k_j modulo q, i.e., that the vector (Y₁,Y₂,Y₃,...) is the surprisingly short vector (k₁,k₂,k₃,...) plus x(X₁,X₂,X₃,...) plus a multiple of q in each coordinate. In other words, (Y₁,Y₂,Y₃,...) is surprisingly close to the public lattice consisting of all multiples of (X₁,X₂,X₃,...) plus all multiples of q in each coordinate. Finding the lattice vector surprisingly close to (Y₁,Y₂,Y₃,...) immediately reveals x.

Howgrave-Graham and Smart introduced this lattice view of biased DSA in 1999, and reported being able to quickly find x from "only 30 signed messages" with 152-bit k_j, using a simple algorithm to find the close lattice vector. Howgrave-Graham and Smart also pointed out that there's nothing special about having the k_j range start from 0. For example, if one knows that k₁ is in the interval {37*2¹⁵²,...,38*2¹⁵²−1}, then one can subtract 37*2¹⁵² from Y₁ and from k₁, reducing to the case that k₁ is in the interval {0,...,2¹⁵²−1}. Or one can reduce to the case that k₁ is in the balanced interval {−2¹⁵¹,...,2¹⁵¹−1}, which is even better for the lattice attacks.

"It is important that no bits of the ephemeral keys are leaked for whatever reason", Howgrave-Graham and Smart wrote. If you assume that "important" means "important for security against known attacks", then you can object that Howgrave-Graham and Smart were overstating their results. They had reported an attack using 8 bits of each k_j; this does not imply an attack using, say, 4 bits of each k_j. Followup papers plugged fancier lattice algorithms into the same ideas, showing that an attacker knowing 4 bits of each k_j could quickly find x, but you can still object that this doesn't apply to NSA's leak of roughly 0.6 bits of information about each k_j.

On the other hand, as a cryptosystem designer, I don't want security reviewers to have to spend time analyzing fancy attacks that I could have eliminated by tweaking my design. Security review is the best protection that users have against attacks; security failures come primarily from having security review spread too thin. From this perspective, it is important to eliminate the leak of information about k.

Security analysis of DSA biases, part 2: Bleichenbacher's attack. In October 2001, NIST issued a new DSA standard with a change notice saying that an "unpublished attack" had broken NSA's biased DSA randomness using a "workfactor of 2⁶⁴" and "2²² known signatures". This statement doesn't make clear what the units are for measuring this "workfactor", i.e., how complicated each of the 2⁶⁴ steps is. The statement doesn't even make clear whether the attack is for NSA's 512-bit primes (which were already known to allow feasible discrete-logarithm attacks), or for 1024-bit primes, or both.

Earlier news had attributed the attack to Bleichenbacher, and had quoted "Edward Roback, chief of the Computer Security Division in NIST's Information Technology Laboratory" as saying that "those who are using DSA can continue to use it with confidence that DSA signatures done under the present standard will remain secure for many more years". I'm not aware of any publications explaining why NIST thought that attackers couldn't reach a "workfactor of 2⁶⁴".

Bleichenbacher privately reported his attack to an IEEE working group in November 2000, but as far as I know didn't immediately post the slides or any other attack details. A small amount of information about the attack appears in Section 5 of a 2001 report by Vaudenay. Bleichenbacher's slides (TIFF; here's a PDF version) were posted in late 2004. The first serious writeup I saw was a 2013 paper by De Mulder, Hutter, Marson, and Pearson.

Here's a concrete example to illustrate some of the techniques. Bleichenbacher numerically measures the bias of k as the average over k of the complex number exp(2πik/q). If k were uniformly distributed modulo q then this complex number would be equally spaced around the unit circle in the complex plane and the average would be 0. However, k isn't uniformly distributed modulo q. Let's take NSA's original q, around 0.7 times 2¹⁶⁰; a short calculation shows that the bias of k is around 0.05+0.21i.

If k₁ and k₂ are independent identically distributed variables then the bias of k₁+k₂ is the average of exp(2πi(k₁+k₂)/q) = exp(2πik₁/q)exp(2πik₂/q), which factors as the bias of k₁ times the bias of k₂. Similarly, the bias of k₁−k₂ is the bias of k₁ times the complex conjugate of the bias of k₂.

Consider many equations Y_j=xX_j+k_j modulo q from, say, 2⁴⁰ signatures under one key. Assume that each X_j is reduced modulo q to the range {0,1,...,q−1}. Sort the equations into increasing order of X_j. Consider a difference of two adjacent equations: Y_j+1−Y_j=x(X_j+1−X_j)+(k_j+1−k_j). Presumably each difference X_j+1−X_j will be on the scale of q/2⁴⁰; a more precise analysis would look at the distribution of these differences. Meanwhile the difference k_j+1−k_j would have bias (0.048+0.21i)(0.048−0.21i), around 0.047, if k_j+1 and k_j were independent; presumably the sorting of X_j doesn't noticeably change this bias.

Repeat this sort-and-difference process three times, reducing each X first to about 120 bits with k bias around 0.047, then to about 80 bits with k bias around 0.047², then to about 40 bits with k bias around 0.047⁴, i.e., around 2^−17.6. Write the resulting equations as Y'_j=xX'_j+k'_j, where each X'_j has about 40 bits, and each k'_j has bias around 2^−17.6.

Say we have a rough guess G for x, and we compute the average A(G) of exp(2πi(Y'_j−GX'_j)/q), i.e., the average of exp(2πi(x−G)X'_j/q) exp(2πik'_j/q). For most choices of G, this is an average of 2⁴⁰ random-looking points on the unit circle, so presumably it has size around 2⁻²⁰. However, if G is within 2¹¹⁵ of x then the differences (x−G)X'_j are bounded by 2¹⁵⁵ (this is where the smallness of X'_j is important), so exp(2πi(x−G)X'_j/q) is close to 1, so presumably A(G) is close to the bias of k'_j, around 2^−17.6, which is noticeably bigger than 2⁻²⁰.

Now try, e.g., G=0, G=q/2⁴⁵, G=2q/2⁴⁵, G=3q/2⁴⁵, etc. It's easy to use a fast Fourier transform to compute A(G) simultaneously for all such G with only about 2⁵⁰ multiplications of complex numbers. Hopefully the largest A(G) value will come from some G close to x, revealing the top bits of x, and then one can repeat to find the remaining bits of x. On the other hand, maybe 2⁻²⁰ is too much noise compared to 2^−17.6. One can try several of the largest A(G) values, or reduce the noise by using more equations, as I'll now explain.

How can we generate more equations without starting from more signatures? One approach is to go beyond adjacent differences X_j+1−X_j, also considering X_j+2−X_j and X_j+3−X_j and so on up to some size limit. Presumably there are about 2⁵⁰ differences below, say, q/2³⁰, and then about 2⁵⁰ differences below q/2⁸⁰ at the next level, and then about 2⁵⁰ differences below q/2¹³⁰ at the next level. It's tricky to analyze the randomness of lists that include, e.g., the difference X₁−X₀ and the difference X₂−X₁ and the difference X₂−X₀, which is the sum of X₁−X₀ and X₂−X₁; one should carry out experiments to see how well the algorithm actually works. (One should carry out experiments even for algorithms where someone claims to have a complete analysis.)

Another approach is to start with only 2³⁴ signatures and consider all sums X_j+X_i mod q. Presumably there will be almost 2⁶⁷ different sums, and more than 2⁵⁰ cases where two sums are within q/2⁸⁰ of each other, and more than 2⁵⁰ differences below q/2¹³⁰ at the next level. As Bleichenbacher notes in his slides, there are standard techniques to find nearby sums with far less memory than storing and sorting all the sums. (For each j, build a low-memory generator that outputs X_j+X_i for all i in increasing order, and then merge the outputs of these generators. Bleichenbacher credits 1981 Schroeppel–Shamir. My sortedsums paper cites a 1973 Knuth exercise crediting William S. Brown. It wouldn't be surprising if there are even earlier references.)

Or, starting with fewer signatures, we could consider all sums of four X values, and directly look for 2⁵⁰ sums within q/2¹³⁰ of each other. This is essentially the same as looking for 2⁵⁰ sums that collide on the top 130 bits. If you think it's obvious how quickly such searches can be carried out, then try to explain why the 2010 Howgrave-Graham–Joux subset-sum algorithm has exponent around 0.337n, and why the 2011 Becker–Coron–Joux subset-sum algorithm has exponent around 0.291n, and why the 2019 Esser–May subset-sum algorithm has exponent around 0.255n.

An even larger cloud of attacks. Back in October 2000, a month before Bleichenbacher's IEEE presentation, Blum, Kalai, and Wasserman (BKW) had introduced an algorithm to find a secret vector x of integers modulo 2, given noisy dot products xX₁+k₁, xX₂+k₂, etc. Each X_j is a random public vector of the same length as x, and each k_j is a bit set with some specified probability smaller than 1/2.

The BKW algorithm sorts X_j according to their leading bits, and then computes successive differences X_j+1−X_j, obtaining noisier dot products x(X_j+1−X_j)+(k_j+1−k_j). The BKW algorithm repeats this sort-and-difference process to clear more and more bits of X, at the expense of more noise. Eventually it finds many noisy dot products x(0,0,...,0,1)+k'_j, takes the majority vote of these dot products to see the last bit of x, and similarly finds the other bits of x.

The problem considered by BKW, where x is a vector modulo 2 and some k_i bits are set, is called the "learning parity with noise" (LPN) problem. In the "learning with errors" (LWE) problem, x is a vector modulo q and each k_i is a small integer. LWE for length-1 vectors, the target of Bleichenbacher's attack, is called the "hidden-number problem" (HNP), an example of how scientific terminology can be selected to minimize the amount of information communicated to the uninitiated.

Followups to the BKW paper adapted the BKW idea to the LWE problem and to various lattice problems, and eventually added further speedup ideas that Bleichenbacher had used, such as fast Fourier transforms ("fast Walsh transforms" in the mod-2 case) and non-adjacent differences. An analysis in 2015 by Herold, Kirshanova, and May showed that one of the latest BKW variants sometimes had better asymptotics than basis reduction as an attack against reasonable-looking LWE parameters for lattice-based cryptography; look at Figure 1 in the paper to get a quick idea of how close the competition is. Papers are continuing to appear with basis-reduction speedups and with speedups for various BKW/Bleichenbacher-type algorithms. The literature is complicated, and is continuing to become more complicated.

As a cryptosystem designer, I'm much happier with the literature on conservative prime-field ECC. There's a much simpler state-of-the-art attack algorithm that includes all known optimizations. There's a convincing high-precision analysis of the performance of the algorithm. The literature has clear explanations of quantitative obstacles to other attack ideas such as pairings and decomposition, and has no ideas for getting around these obstacles. But NSA's DSA biases spoil the simplicity of the attack picture, forcing the security reviewer to consider a much more complicated collection of worrisome attack strategies.

Overconfidence in limits on the number of signatures. NIST's 2001 change notice for the DSA standard provided two options for users to "defend" against Bleichenbacher's attack. The first option was for users to continue using NSA's biased DSA randomness but sign at most 2²¹ signatures per key, half of the "2²² known signatures" used by the "unpublished attack".

Why did NIST think that a limit of 2²¹ signatures would be safe? Was there some quantitative analysis saying that reducing 2²² signatures to 2²¹ signatures would increase the attack cost to something clearly infeasible? And what made NIST confident that a new "unpublished attack" was optimal?

Bleichenbacher needs enough equations to compensate for the noise. Sorting equations and then subtracting adjacent equations, as in the BKW paper, doesn't expand the number of equations available. But Bleichenbacher's slides were explicitly generating far more equations as sums of small non-adjacent subsets of the original equations. I don't understand how anyone could have been confident regarding the tradeoffs here, even before seeing this decade's breakthroughs in subset-sum algorithms.

Here are further examples to illustrate the lack of maturity of the analysis of these tradeoffs. In 2012, Damgård and Park proposed some cryptosystems based on the supposed hardness of the following example of LPN. There is a secret n-bit vector x, where n is 768. The attacker sees a limited number of noisy dot products xX+k, where X is public and the noise k is 1 with probability 1.5%. The limit is a small constant times n, where the constant depends on the cryptosystem variant. The original version of the paper stated that BKW needs as input a gigantic number of "samples" xX+k, while, for the paper's cryptosystems, "the adversary only gets O(n) samples, which is the hardest case, so this suggests that relatively small values of n might be sufficient for security". The paper was then revised to account for "recent attacks on LPN that invalidated the conclusions of the previous version". See, e.g., my Ring-LPN attack paper with Lange.

In the LWE context, a 2014 paper by Albrecht, Faugère, Fitzpatrick, and Perret stated that "one of the main remaining obstacles for applying the BKW algorithm to cryptographic constructions based on LWE is that it requires an unbounded number of samples to proceed. Lifting this requirement, if only heuristically, is hence a pressing research question." Followup papers then pointed out that one could use far fewer samples.

Safe fixes for DSA. Instead of trying to pretend that the limits of these attacks are clear, let's eliminate the applicability of the attacks to DSA, by fixing the problem at its source. The problem is that—for NSA's choices of q around, say, 0.7 or 0.8 times 2¹⁶⁰—taking 160 bits of randomness modulo q produces a biased result.

If we modify q, moving it extremely close to 2¹⁶⁰ or to 2¹⁵⁹, then the bias rapidly disappears. This is also naturally accomplished by ECDSA if we choose a prime p extremely close to 2¹⁶⁰ (rather than, say, the Brainpool "random" p) and choose an elliptic curve with a power-of-2 cofactor such as 1 or 2 or 4 or 8: q will then be extremely close to 2¹⁶⁰ or 2¹⁵⁹ or 2¹⁵⁸ or 2¹⁵⁷ respectively.

The bias also rapidly disappears if we take more bits of randomness. As I mentioned above, if q is between 2¹⁵⁹ and 2¹⁶⁰, then you'll never be able to tell the difference between a uniform random 320-bit integer mod q and a uniform random element of {0,...,q−1}. This was NIST's second option for users to protect themselves: concatenate two 160-bit hashes to obtain a 320-bit integer.

Ed25519 has both of these defenses. It uses Curve25519, with a subgroup of prime order extremely close to 2²⁵². It specifies k as a deterministic, easily testable SHA-512 hash of a secret and the message being signed. Of course, Ed25519 also moves from NSA's dangerously small cryptosystem sizes up to a safe security level.

Timing attacks against DSA. The reason this isn't the end of the story is that mathematical biases in the construction of the secret k aren't the only way to leak bits of k. For example, a 2011 paper by Brumley and Tuveri steals "the private key of a TLS server where the server authenticates with ECDSA signatures". The attack was demonstrated against OpenSSL using the NIST B-163 curve. The choice of q here isn't the issue; q is extremely close to a power of 2.

The paper title is "Remote timing attacks are still practical". OpenSSL computed the kth multiple of an elliptic-curve point with an algorithm whose time revealed the position of the top bit in k: the algorithm took the maximum time if k was between 2^b−1 and 2^b, less time if k was between 2^b−2 and 2^b−1, etc. Sometimes timings would show, e.g., that k is below 2^b−8, putting the attacker into the situation of the fast Howgrave-Graham–Smart attack.

How to stop timing attacks. A programmer who knows enough about CPU timing can systematically write software that avoids all data flow from secrets to timings. For example, all of the C and assembly implementations from the Ed25519 team are written in this way, as are the NaCl and BearSSL crypto libraries.

(One can also randomize computations to try to obscure the information leaked through timings: e.g., add a random 512-bit multiple of q to k before scalar multiplication. However, it's very difficult for the implementor, and for reviewers, to figure out how secure randomization is against timing attacks. A better argument for randomization is that attackers sometimes have access to more invasive side channels such as power measurements: truly constant power seems very difficult to achieve, and randomization seems to be our best bet. I advise against randomization as a replacement for constant-time software, but I don't object to randomization as an extra line of defense whenever the user can afford it.)

Saying that implementors can write constant-time software doesn't mean that they do write constant-time software. I see three basic reasons for this. First, there's a huge amount of variable-time software in existing crypto libraries, and getting rid of all of this software is a correspondingly huge effort. Second, many popular cryptographic functions produce unacceptable slowdowns when they're implemented in constant time. Third, and most importantly, many popular cryptographic functions are much simpler to implement in variable time.

I noticed many years ago that the evolution of computer architecture was making it easier and easier to resolve the tension between security and speed: there's a synergy between what I want to do to avoid timing leaks and what chip designers want to do for performance. I've set various cryptographic speed records using constant-time software. Sometimes there are still some minor speedups possible with variable-time software, but the tension is much smaller than it was, making it hard for people to argue that these speedups should outweigh security—especially since computation in general is becoming less and less expensive. I'm not saying that all cryptographic designs have such small penalties for constant-time implementations; having the freedom to throw away NSA's designs has been an important part of this work.

I've also tried to resolve the tension between security and simplicity. Software for X25519, Ed25519, etc. does everything the typical user needs, is much simpler than traditional software aiming at the same security goals, and would gain very little in simplicity from using variable-time algorithms.

But now consider a crypto library that for years has included variable-time software for NSA's elliptic-curve cryptosystems, and that wants to keep supporting those cryptosystems for whatever reason, but that also wants to add support for Ed25519. Different notions of simplicity then produce different results. It's simple for the library to copy an existing constant-time Ed25519 implementation; this implementation might be close to the simplest possible code for Ed25519; but the library author might think that there are simpler ways to simultaneously support Ed25519 and NSA's systems.

Case study #1: libgcrypt. In libgcrypt, there's a function _gcry_mpi_ec_mul_point that computes kP on an elliptic curve E, given k and P and E as inputs. The main loop inside the function has length mpi_get_nbits(k), a simple example of what one must not do in constant-time software. Actually, the function has different loops for different types of curves, and a complete description of the loop length is more complicated, but what matters is that a smaller k runs measurably faster.

The real fix, the constant-time approach, would start by changing the interface to replace mpi_get_nbits(k) with a maxscalarbits specified by the caller. But this would require going through dozens of functions that call _gcry_mpi_ec_mul_point and figuring out the appropriate maxscalarbits for each. This is an example of tension between simplicity and security. Even a small tension can push implementors to do something insecure. The tension increases as more and more elliptic-curve protocols are implemented on top of the existing _gcry_mpi_ec_mul_point interface. It's not surprising that libgcrypt still has the old variable-time interface.

At this point one might think that, well, libgcrypt is leaking the number of bits of k, so the standard lattice attacks apply, whether the protocol is ElGamal or Schnorr or DSA or ECDSA or EdDSA. This is also what Minerva claimed at first. But let's take a closer look at how large k is.

The DSA and ECDSA standards say that k is between 0 and q. By now there are several different standard methods for generating k. libgcrypt's _gcry_dsa_gen_k uses one of these methods, namely rejection sampling, which I described above.

EdDSA uses a different name "r" for k, and says that "r" is SHA-512 applied to various specified inputs. This is what libgcrypt implements: hashing the inputs with _gcry_md_hash_buffers, then reading the 64-byte result into a multiprecision integer r with _gcry_mpi_set_buffer, then calling _gcry_mpi_ec_mul_point.

Now _gcry_mpi_ec_mul_point leaks whether "r" has 512 bits, 511 bits, 510 bits, etc. But this is useless for the attacker. Say "r" has only 500 bits; the attacker still has no idea what "r" mod q might be. This is why Minerva works against libgcrypt's implementation of ECDSA and doesn't work against libgcrypt's implementation of EdDSA.

The situation would have been different if libgcrypt had inserted an extra line of code to reduce modulo q before calling _gcry_mpi_ec_mul_point. But this wouldn't have been as simple, so it's not such a surprise that this didn't happen. (The extra reduction work would have produced a speedup, but copying existing constant-time Ed25519 software would have produced a much larger speedup.)

libgcrypt is in the scenario of the first paragraph: the libgcrypt timing reveals the number of leading zeros in Ed25519's 512-bit hash, but this doesn't break the system. libgcrypt doesn't reduce the nonces, so it doesn't get into the trouble described in the second paragraph.

There have been many other timing-attack vulnerabilities in libgcrypt, and clearly there will be more. We have to throw away variable-time crypto code without waiting for attacks to be demonstrated. For example, libgcrypt should use the constant-time ladders supported by Curve25519. But this doesn't mean that Minerva broke libgcrypt's Ed25519 implementation: libgcrypt was saved by another Ed25519 feature, the double-size hash.

Case studies #2, #3, #4, #5, #6, #7, and #8. A quick skim suggests that BoringSSL, Botan, Crypto++, libtomcrypt, MatrixSSL, OpenSSL, and wolfSSL/wolfCrypt all started with upstream Ed25519 implementations that advertise being constant-time, such as ref10 from the Ed25519 team, donna from Adam Langley and Andrew Moon, tweetnacl from the TweetNaCl team, and fiat from the Fiat Crypto team.

Of course, a quick skim isn't the same as a serious security review. The sheer volume of code is horrifying for the reviewer. Unlike the upstream implementations, these seven crypto libraries don't declare time variability to be a bug (unless the bug is shown to be exploitable and, in the case of OpenSSL, remotely exploitable), so a reviewer who finds time variability isn't automatically rewarded with an acknowledgment of a valid bug report. The libraries have so much existing variable-time code that they're scared to declare time variability to be a bug.

So I'm certainly not guaranteeing that timing attacks will fail against all these Ed25519 implementations. But, given how much of the code in these Ed25519 implementations is coming from people who say they were engineering it to be immune to all timing attacks, it wouldn't be surprising if it is immune to all timing attacks.

Does the prevalence of constant-time implementations of a primitive mean that the primitive encourages such implementations? Superficial answer: Yes. New implementations can simply copy the existing constant-time implementations. This is what these seven libraries did, protecting the users. (But, again, one has to verify that the upstream implementations really are constant-time, and that this wasn't screwed up by whatever modifications happened as part of the library integration.)

Deeper answer: Not necessarily. Maybe the primitive is painful to implement in constant time. Maybe the first implementors suffered through doing this, but new implementations will cut corners.

In the case of Ed25519, it's clear from a detailed design comparison that Ed25519 is much friendlier to constant-time implementations than NSA ECC is. This doesn't guarantee constant-time implementations, but it does make them more likely.

Case study #9. Finally, I looked briefly at BouncyCastle, which wrote everything from scratch in Java. BouncyCastle is full of variable-time code. I don't immediately see why BouncyCastle wasn't broken by Minerva. BouncyCastle uses scalars smaller than q for scalar multiplication for ECDSA and EdDSA, unlike libgcrypt, so BouncyCastle isn't protected by the double-size hash.

If BouncyCastle's implementations of ECDSA and EdDSA are broken by timing attacks then the tallies turn into 13 crypto libraries supporting ECDSA with timing attacks on 5 or possibly 6, and 9 crypto libraries supporting EdDSA with timing attacks on just 1. This still sounds like a big win for EdDSA overall, but it's interesting to contemplate ways that we could have saved BouncyCastle too.

Why didn't BouncyCastle use a simple constant-time ladder? This is clear from the code: BouncyCastle has many complications that are aiming for speed at the expense of simplicity. In particular, BouncyCastle sacrifices a small amount of code simplicity to reduce k mod q, and sacrifices even more code simplicity to use precomputed tables of various multiples of the fixed base point P. These steps obviously make signatures several times faster.

Precomputed tables are still compatible with constant-time code, as shown in previous Ed25519 implementations. Why didn't BouncyCastle use this approach? This isn't as clear. Variable-time fixed-base scalar multiplication is at best marginally faster than constant-time fixed-base scalar multiplication. Perhaps there haven't been enough tutorials showing how to implement constant-time fixed-base scalar multiplication.

We don't want reviewers to have to analyze the security consequences of any timing leaks, so, at a lower level, it's also important to perform field arithmetic in constant time. BouncyCastle doesn't do this: it implements prime-field arithmetic on top of a big-integer library that computes the size of each integer at run time. But why didn't BouncyCastle copy constant-time arithmetic mod 2²⁵⁵−19 from previous implementations, as a fast path for Ed25519 separate from the existing big-integer code?

One possible answer is along the following lines: "A crypto library should support 82 elliptic curves but, for simplicity, shouldn't separately implement 82 finite fields." I would say that, for simplicity and security, a crypto library should not support 82 elliptic curves. I think we've made some progress in convincing the community of this. There has also been considerable progress in building automatic generators for finite-field code, although there is still much more work to be done.

[2019.11.01 correction: BouncyCastle does have a separate implementation of arithmetic modulo 2²⁵⁵−19, and higher-level implementations of X25519 and Ed25519 on top of this. BouncyCastle's EdDSA implementation consists of this Ed25519 implementation and a similar implementation for Ed448. These implementations are designed to run in constant time (except for clearly marked Var subroutines applied to public data). Looks like another win for EdDSA. BouncyCastle has also been making some progress in removing variable-time code elsewhere: for example, for ECDSA, BouncyCastle 1.59 removed some variable-time array lookups, although it is still using the variable-time big-integer library. Thanks to Peter Dettman for the information.]

More fundamentally, why didn't BouncyCastle call existing high-speed code in other languages for arithmetic mod 2²⁵⁵−19, or for the full Ed25519 operations? Sure, some people might want to use BouncyCastle on platforms that don't have this code and that don't make it easy to install this code, but calling the code whenever it is available would give BouncyCastle much better Ed25519 speeds—and, as a side effect, would stop timing attacks.

The Ed25519 team could have provided a constant-time Java implementation from the outset. To me this feels like the wrong direction: every extra piece of crypto code is an extra problem for reviewers and shouldn't be created in the first place unless there's a clear argument that it's needed. A few implementations of a few cryptographic primitives have been computer-verified, and improved tools will cover much more in the foreseeable future, but verifying a Java implementation is surely a lower priority than verifying a faster implementation that can be called from Java and from many other languages.

I see cryptography as a basic operating-system service, just like access to network cards. We don't, and we shouldn't, write network-card drivers in many different application languages; we write them in a few network-card-driver languages, and on top of this we provide simple networking abstractions to the designers of other languages, who in turn provide various higher-level networking abstractions to applications. There's much more to be said about the design of network-card-driver languages, and about the design of languages for cryptography, but that's a topic for another blog post.

The cr.yp.to blog

2019.10.24: Why EdDSA held up better than ECDSA against Minerva: Cryptosystem designers successfully predicting, and protecting against, implementation failures. #ecdsa #eddsa #hnp #lwe #bleichenbacher #bkw