The cr.yp.to blog



2026.06.30: Understanding lattice risks: Many differences between marketing and reality. #lattices #software #looseness #modules #asymptotics #worstcase

I have a short new page giving general context for the following and links to further information, so I'll just jump straight into the specific topic here.

Here's a paragraph that appeared on 29 June 2026 as supposed justification for using solo ML-KEM rather than ECC+ML-KEM: "I do not believe the risk of ML-KEM (and ML-DSA) to be severe: there is no known cryptanalysis currently exploiting rank >=2 module structure at these parameters that performs better than generic lattice reduction. Module-LWE also has a (granted, an asymptotic) worst-case-to-average-case reduction - something neither RSA nor ECDLP had."

My reaction to this is: wow, so many mistakes packed together! The two sentences (1) erroneously conflate lattice risks with a narrow slice of those risks, (2) use jargon in a way that tends to hide the narrowing from readers, and (3) still manage to each be simply false. What I'll do in this blog post is unpack the flaws.

"Known". This part of the narrowing is something that I think readers will typically notice. It's a glaring risk-management error, asking us to merely react to known failures rather than proactively protect against unknown failures.

But the second sentence doesn't have this narrowing, and I think readers will understand the second sentence as talking about some sort of proactive protection. There are also more problems with both sentences, so let's move along.

"Cryptanalysis". How many readers will realize that this word is another narrowing of the risk surface?

The reference software for Kyber (ML-KEM) has already gone through three rounds of emergency security patches for timing attacks: KyberSlash 1, KyberSlash 2, and Clangover. The reference software isn't an isolated example: the majority of Kyber/ML-KEM libraries have issued KyberSlash patches. The KyberSlash paper won the best-paper award at CHES 2025. However, cryptographers typically don't classify timing attacks as "cryptanalysis". Even those who do will usually emphasize that it's analysis "of the ML-KEM software"; it's not cryptanalysis "of ML-KEM", meaning the ML-KEM specification.

Similarly, attacks exploiting bugs, such as the bugs highlighted in my new paper on ML-DSA, don't qualify as cryptanalysis.

For scientists writing different types of papers, it's useful to have words to describe those differences. But users need cryptographic software to be secure. Security compromises often come from mathematical attacks against specs but often come from software problems not visible in the specs.

For years I've been pointing out risks of failures in PQ specs and in PQ software, and I've been connecting these risks to my recommendations of ECC+PQ. Look, for example, at how I described NISTPQC as "the largest regression ever in the quality of cryptographic software" and said this "will not be easy to fix", and at how I wrote that "bugs in post-quantum software" warrant "a blanket rule of always upgrading from ECC to PQ+ECC, not discarding the ECC layer".

The software argument is a really tough argument for proponents of solo PQ to respond to. Readers know that software problems happen all the time, can easily find examples of problems in ML-KEM and ML-DSA software, and can even find demos exploiting some of those problems. There's a basic lack of credibility if a proponent claims, e.g., that there will be "exceedingly few bugs", while dodging basic questions about how many "exceedingly few" is, what the justification is supposed to be for that number, and why that number is supposed to be low enough to justify throwing away a broadly deployed low-cost mitigation. It's much easier for proponents to skip the software issue and focus on some other aspect of security.

But, ok, let's also stop talking about software now. There are other problems with the sentences I quoted at the top.

"Exploiting ... module structure". How many readers will realize that the "module" jargon here is focusing on just one aspect of the attack surface provided by the ML-KEM spec?

The official Kyber security analysis includes the following statement: "The best known attacks against the underlying MLWE problem in Kyber do not make use of the structure in the lattice. We therefore analyze the hardness of the MLWE problem as an LWE problem." The "M" in "MLWE" (and in "ML-KEM") means "module"; the statement here is that the module structure doesn't lose security.

But there's more to the Kyber/ML-KEM attack surface. For example, even though the official Kyber documentation claimed at the top of Section 1 that the "security of Kyber is based on the hardness of solving the learning-with-errors problem in module lattices (MLWE problem [67])", page 19 admits that the theorem relating Kyber's security to MLWE security is a "non-tight reduction". That's jargon for admitting that the theorem allows Kyber's security level to be many bits lower than the security level of "the underlying MLWE problem". The documentation doesn't quantify the gap, presumably because doing so would show a frighteningly large gap.

I have a paper that exploits a simpler tightness gap in another lattice-based cryptosystem, FrodoKEM. For example, the paper shows that if you send 240 ciphertexts to a frodokem640 public key then one of the ciphertexts will be decrypted by a large-scale attack that's feasible today. This is beyond an academic demo, but it does disprove an official FrodoKEM security claim. That version of FrodoKEM was then officially renamed "ephemeral FrodoKEM" (which I think means we're supposed to forget this version ever existed) and was officially replaced with a revised "FrodoKEM".

A survey by Koblitz and Menezes includes more examples of cryptographic attacks exploiting tightness gaps.

There's a risk that Kyber's tightness gaps will turn out to be exploitable too. This is just one example of the spec risks excluded by the words "exploiting ... module structure".

"Rank >=2". This is yet another narrowing. Let me give some context here and then connect the dots.

Craig Gentry introduced an FHE system at STOC 2009 in a paper "Fully homomorphic encryption using ideal lattices". The paper has been cited more than 14000 times and is often labeled as a breakthrough. The standard choice of structure for that system (as in ML-KEM) uses polynomials modulo xn+1 where n is a power of 2. It's much more likely for readers to have heard of Gentry's system, and to have heard of this standard structure, than to have heard that Gentry's system with this structure is vulnerable to a quantum polynomial-time attack.

I had a February 2014 blog post pointing out some weaknesses in the underlying ideal-lattice problems, and then subsequent work took the attack ideas much further. A 2025 paper by Jean-François Biasse and Fang Song presents details of the quantum polynomial-time attack. (Technically, the speed analysis for the attack relies on a number-theoretic conjecture, but there's overwhelming evidence for that conjecture.)

There are complicated debates about the security of some ideal lattices beyond the ones used in Gentry's system, but the details don't matter here. What matters is that enough damage has been done to ideal lattices that no expert today would advocate relying on ideal lattices as the foundation of security.

This is radically different from the picture painted in a 2012 paper "On ideal lattices and learning with errors over rings" by Vadim Lyubashevsky, Chris Peikert, and Oded Regev. That paper claims to prove "very strong hardness guarantees" for "ring-LWE". But this proof starts from the assumption "that worst-case problems on ideal lattices are hard", exactly what I'm saying no expert would advocate relying on today.

A 2014 paper "Lattice cryptography for the Internet" by Peikert similarly claimed that "both ring-SIS and ring-LWE enjoy strong provable hardness guarantees" and that this is “good theoretical evidence that ring-SIS and ring-LWE are a solid foundation on which to design cryptosystems". Kyber's direct predecessor NewHope, introduced in 2015 and submitted in 2017 to the NIST post-quantum competition, repeated this evidence as the final step in its "Provable security reductions" for its "Justification of security strength".

Maybe ring-LWE is strong. Maybe not. Either way, we now know that pointing to ideal lattices is a poor argument for the strength of ring-LWE.

Kyber was introduced in 2017, was also submitted to the NIST post-quantum competition, and, after a series of modifications, was standardized as ML-KEM. The most obvious difference between NewHope and Kyber is that NewHope uses ideal lattices, also known as "rank-1 module lattices", while Kyber uses module lattices of larger rank. Kyber's 2017 documentation cites various advances in attacks against rank 1 and says that higher rank has "somewhat reduced structure".

Any competent risk assessment will pay attention to this history. Experts proposed lattice systems that ended up being broken; that's worrisome! Kyber's 2017 usage of higher-rank modules was explicitly in response to advances in attacks. Continued developments of the same line of attacks have already broken a variety of supposed "barriers" and "bounds" for those attacks. Will the line between rank 1 and higher rank hold up, or will it turn out to be another of these broken "barriers"?

"Better than generic lattice reduction". This is another way of narrowing the risk analysis. How many readers realize that there are continual improvements in generic lattice attacks against LWE? Saying that we'll assume there's nothing better than an LWE attack isn't helpful if LWE itself isn't strong enough!

Let's look again at FrodoKEM, supposedly the "most conservative" lattice system.

FrodoKEM says it's based on "the algebraically unstructured, plain LWE problem with conservative parameterizations", and more specifically that it's a modified "instantiation and implementation" of a 2010 paper. But FrodoKEM is much quieter about one of those modifications, namely the fact that FrodoKEM drastically increased sizes compared to the 2010 paper.

The 2010 paper proposed dimension-256 lattices as supposedly taking "about 2150 operations" to break. The reason FrodoKEM moved to much larger dimensions is that there were a bunch of attack papers chopping more and more bits out of lattice security levels. It's not that one paper suddenly did a bunch of damage: each paper chopped out far fewer bits, but the cumulative damage from many papers meant that, no, dimension-256 is nowhere near 2150 security.

Readers who have heard that "ML-KEM was fully vetted" during the NIST competition would imagine that attack improvements have come to an end. But, wait, then how do we explain an October 2025 lattice-attack speedup? Or a December 2025 lattice-attack speedup? Each of these is another paper claiming to cut out a few bits of security.

When is the cliff going to stop crumbling? Are the lattice dimensions used for ML-KEM and FrodoKEM today going to sound as ignorant in 15 years as dimension 256 from the 2010 paper? And what happens if attackers find the improvements before the public does? How can it can make any sense to narrow the risk analysis of lattice-based cryptosystems in a way that excludes every improvement in generic lattice attacks?

Even after all this narrowing, the first sentence is wrong. Let's look again at the claim that "there is no known cryptanalysis currently exploiting rank >=2 module structure at these parameters that performs better than generic lattice reduction".

So far I've been emphasizing how the words here are narrowing the risk analysis in a way that excludes a bunch of attacks: "cryptanalysis" excludes software problems such as KyberSlash, "exploiting ... module structure" excludes tightness problems such as the FrodoKEM flaw, "rank >=2" excludes ideal-lattice attacks such as the break of Gentry's original STOC 2009 FHE system, and "better than generic lattice reduction" excludes a neverending series of speedups in generic lattice attacks.

It's content-free to come up with a statement saying "there are no known attacks meeting the following criteria: ..." if those criteria are chosen to exclude every attack that is known. This becomes actively misleading if it's accompanied by not even citing those attacks.

But here's the funny thing: this is an error-prone process when the attack picture keeps changing. It's not that the list of criteria is something stable and well known and well studied. Someone hears about an attack and writes down a criterion that excludes the attack, but that criterion is flimsy and is punctured by the next attack.

A paper appeared in February 2026 under the title "On the concrete hardness gap between MLWE and LWE". The paper says that it saves a few bits in attacks against ML-KEM, compared to generic lattice attacks, by exploiting the structure of the modules used in ML-KEM.

This means it's not true that "there is no known cryptanalysis currently exploiting rank >=2 module structure at these parameters that performs better than generic lattice reduction". Oops.

Maybe the author of that statement will say "sorry, I meant performs much better". But if your answer to every attack is to come up with an ad-hoc excuse for ignoring the attack then you aren't evaluating risks.

"Asymptotic". Let's move on to the second sentence: "Module-LWE also has a (granted, an asymptotic) worst-case-to-average-case reduction - something neither RSA nor ECDLP had."

What the word "asymptotic" is actually saying is that if ML-KEM were replaced with something much larger then the underlying MLWE problems would have a "worst-case-to-average-case reduction".

Some years ago, Peikert gave a presentation to a National Academies committee where he claimed without evidence that dimensions of "a few thousand" would be enough for a worst-case-to-average-case reduction in the simpler context of FrodoKEM. I was there, and I think open-source science is important as an error-correction mechanism, so I noted his claim online and wrote that I would "love to see a complete proof handling 10000".

In 2021, Peikert highlighted a claim that dimension 1460 was sufficient. But this claim, which unfortunately has never been withdrawn, arises from an embarrassing mistake (conflating an "approximate" lattice problem with a stronger "exact" lattice problem).

This mistake was pointed out in a 2023 paper from Joel Gärtner, which also invested a lot of effort into writing down a complete proof and trying to reduce the dimension as far as possible. It still doesn't manage to get the dimension down to 10000; not even close.

The maximum ML-KEM dimension is 1024. Nobody has a worst-case-to-average-case reduction for such a small lattice dimension. How many readers will understand that the jargon "asymptotic" is making a statement about a different, larger, cryptosystem?

"Worst-case-to-average-case reduction - something neither RSA nor ECDLP had". Sorry, no, completely wrong.

Let me first review what the jargon means. A "reduction" from problem P to problem Q means a way to use a solution to problem Q to solve problem P. In particular, a "worst-case-to-average-case" reduction means a way to use a solver for random examples of problem Q as a way to solve an arbitrary example of problem P, with no P inputs being immune.

As a concrete example, let's solve ECDLP on Curve25519. The input is some curve point sG, where G is the standard Curve25519 generator. Our task is to compute s modulo the order of G.

To reduce this to the average case, simply compute sG+rG where r is chosen randomly modulo the order of G; then use the average-case solver to find s+r modulo the order of G; then subtract r to find s. Done.

(As a side note, this is actually a much more powerful reduction than the worst-case-to-average-case reduction for lattices. It's much more efficient. It doesn't require the cheat of replacing the problem with a bigger problem; see above regarding asymptotic. It's a "self-reduction" between the worst case and average case of the same problem; it doesn't inflict a new problem upon people reviewing security.)

Let me also give an example of a worst-case-to-average-case reduction for RSA. The input is some RSA modulus pq, where p and q are secret primes. Our task is to find p and q.

Here's one way to reduce this to the average case. Randomly generate another RSA key. Apply the average-case solver to that. Throw the results away. Then use Shor's algorithm to factor the original input pq into p and q. Done.

This RSA example might seem to be cheating since it's a quantum worst-case-to-average-case reduction for a cryptosystem that was never supposed to resist quantum computers. But the literature includes a huge pile of cryptosystems broken by non-quantum attacks, and each of those breaks gives a non-quantum worst-case-to-average-case reduction for the same cryptosystem.

The correlation between worst-case-to-average-case reductions and attacks isn't just for the extreme case of broken cryptosystems. For example, my 2015 blog post on multi-target attacks included comments on how to use an "attack tool called a 'worst-case-to-average-case reduction'" to build a square-root discrete-log attack. This is one of the easiest ways to explain in starting courses on cryptography that discrete-log problems have a square-root attack.

From a risk-analysis perspective, these connections between worst-case-to-average-case reductions and attacks mean that worst-case-to-average-case reductions are an alarm bell, something to investigate closely as a risk, even if they're not fatal. Statements such as "Module-LWE also has a (granted, an asymptotic) worst-case-to-average-case reduction - something neither RSA nor ECDLP had" are getting the risk analysis wrong by getting the basic facts wrong.

A procedural note. Mistakes happen. Part of our job as cryptographers is to protect users against those mistakes.

Many new PQ designs have been broken, plus there are many further problems with PQ software. We're certainly not going to catch all the problems before deployment, so we also keep ECC around as a negligible-cost part of ECC+PQ to mitigate the damage of PQ security failures, like wearing seatbelts in a car to mitigate the damage of car crashes, rather than throwing ECC away in favor of solo PQ.

This common-sense decision to use ECC+PQ rather than solo PQ is threatened if the risk analyses are so thoroughly botched that people are blinded to the risks of PQ. How do we protect users against these meta-level mistakes in risk analyses?

The literature on computer security, like the broader literature on many other safety topics, gives us an answer: risk analysis is a first-class topic of papers, and risk-analysis errors are corrected the same way that other errors in papers are corrected. This process takes time, but investing that time helps reduce the number of mistakes.

What I find truly horrifying about the paragraph that I've been commenting on in this blog post is the procedural context. The paragraph wasn't part of a collaborative community process of analyzing risks. The paragraph instead showed up as a last-moment talking point during a limited-time vote on a controversial proposal for IETF to standardize solo ML-KEM in TLS, a weakened form of the widely deployed ECC+ML-KEM in TLS. There's just one week left in the voting period.

IETF doublespeak says that this isn't a "vote" and that the document won't be a "standard", but the reality is that it's a vote, and if the vote passes then corporate purchasing managers will understand the resulting document as IETF endorsement of solo ML-KEM.

Standards organizations aren't supposed to be endorsing controversial proposals. Each document issued by an IETF working group is labeled as "consensus of the IETF community". IETF says that disagreements "must be resolved by a process of open review and discussion". But simply charting the debate shows that the proponents of solo PQ have responded only to minor objections while ignoring every fundamental objection. The mandated process of discussion to reach consensus has been replaced by a voting process.

The same document lost the previous vote. So now there's a new vote with supporters trying to pack the room. Explicitly in response to that, I've called for volunteers to speak up in opposition.

One good reason to oppose is recognizing that solo PQ creates unnecessary dangers compared to ECC+PQ. But another good reason to oppose is simply to say that, procedurally, disagreements have to be resolved.


Version: This is version 2026.06.30 of the 20260630-risk.html web page.