The cr.yp.to blog



2023.11.25: Another way to botch the security analysis of Kyber-512: Responding to a recent blog post. #nist #uncertainty #errorbars #quantification

A blog post a week ago from Google's Sophie Schmieg is titled "Kyber512’s security level". Some of the claims in that blog post are unjustified, and some of the claims in that blog post are simply wrong.

The most important claim in the blog post is that Kyber-512 is as hard to break as AES-128. ("Kyber512 is not less secure than required for NIST security level 1.") That claim is unjustified and, given current knowledge of attack performance, indefensible, even if it's understood as referring only to published attacks.

The unfortunate reality is that analyses of published attacks against Kyber-512 leave very wide uncertainty intervals. For example, because of "known unknowns", the round-3 Kyber submission reported an astonishing 30-bit uncertainty interval for Kyber-512 attack costs measured by "gates". (Page 27 says "a factor of up to 216 in either direction"; later text says 16 bits smaller or 14 bits larger.) The high end of the interval is above AES-128, but the low end is below AES-128. Subsequent papers have found various attack speedups. New "known unknowns" are appearing faster than researchers can narrow down the existing ones.

The blog post that I'm responding to takes a point estimate (from a "convenient table"), and fails to recognize the uncertainty intervals surrounding that estimate. The blog post then concludes that Kyber-512 is safe. Failing to recognize the uncertainty intervals is a critical error, hiding all of the risks from the low end of the uncertainty intervals.

To be clear, I don't blame Dr. Schmieg for this error. I think NIST has done a terrible job of communicating the uncertainties to its audience. NIST should have prominently issued a document for public review saying the following:

If NIST doesn't have these numbers, why is NIST claiming that Kyber-{512,768,1024} are as hard to break as AES-{128,192,256}? ("ML-KEM-512 is claimed to be in security category 1, ML-KEM-768 is claimed to be in security category 3, and ML-KEM-1024 is claimed to be in security category 5.") What does that claim even mean, if NIST refuses to answer simple yes/no questions about the cost metric NIST is using for these comparisons?

I'm not saying NIST has been completely silent about the uncertainties. In particular, in a 2022 posting, NIST stated that Kyber-512 "seems unlikely" to be easier to break than AES-128 by published attacks.

See the word "unlikely"? That's not claiming something definite. NIST's subsequent rationale mentions various sources of uncertainties. These uncertainties, including quantification, have to be much more prominently reported, so that NIST's audience is warned about how poorly understood the attack costs are.

NIST's rationale has a different problem: NIST combined the underlying (uncertain) numbers in a way that's simply wrong, inflating the low and high ends of the uncertainty interval and thus making Kyber-512 look less risky than it actually is. I wrote a blog post going through the history and pointing out the error. I then figured out a simpler way to explain the error from first principles (by which I mean things taught to first-year CS students), and I wrote a second blog post explaining that.

There's a further problem with using claims about published attacks as justification for standardizing Kyber-512. The problem is that lattice attacks have not stabilized. It'll be completely unsurprising to see further speedups; I don't claim to know how much farther they'll go. There have been suggestions that there won't be big speedups, but an easy way to see that those suggestions are fundamentally untrustworthy is to observe that they avoid committing to any numbers.

If continued speedups show that Kyber-512 is feasible to break, will NIST take responsibility for having recklessly standardized an algorithm with an unclear, unstable, unquantified security margin? Will NIST apologize to users who trusted NIST's standards to be safe and who ended up deploying Kyber-512? Of course not. Instead we'll see something like this: "Research is so exciting! There was no way to predict this security disaster! NIST standardization was critical for encouraging this scientific breakthrough! NIST needs more money to write new standards!"

For the rest of this blog post, I'll go line by line through the blog post that I'm responding to.

Sigh. I really didn’t want to have to write this blog post. There is a story going around, claiming that the NSA somehow unduly influenced NIST to choose Kyber over NTRU, despite Kyber not being secure enough.

This is not even close to an accurate description of the linked New Scientist article. The article is titled "Mathematician warns US spies may be weakening next-gen encryption" and never mentions NTRU. The topics covered in the article are instead (1) NSA's involvement in cryptographic standardization and (2) NIST's miscalculation of Kyber-512's security level.

Ten years ago, NSA's budget specifically for cryptographic sabotage was a quarter billion dollars a year. One of the nine classified goals within NSA's sabotage budget was to "influence policies, standards and specification for commercial public key technologies". NSA's goal of making cryptographic standards weak enough for NSA to break had been established forty years earlier, according to an internal NSA history book:

Narrowing the encryption problem to a single, influential algorithm might drive out competitors, and that would reduce the field that NSA had to be concerned about. Could a public encryption standard be made secure enough to protect against everything but a massive brute force attack, but weak enough to still permit an attack of some nature using very sophisticated (and expensive) techniques?

NSA sabotaged DES by limiting the DES key size. ("Ultimately, they compromised on a 56-bit key.") NIST's original DSA proposal, which also secretly came from NSA, was limited to DSA-512; the eventual NIST standard allowed larger DSA sizes but, unfortunately, continued to allow DSA-512. NSA's Dual EC had a secret back door, and NSA's Clipper Chip had a public back door.

Today, it's deeply concerning to see the combination of

The New Scientist article is misreporting when it says "Bernstein, who coined the term post-quantum cryptography in 2003 to refer to these kinds of algorithms, says the NSA is actively engaged in putting secret weaknesses into new encryption standards that will allow them to be more easily cracked with the right knowledge". I did coin the term "post-quantum cryptography" in 2003, but I never said what the article portrays me here as saying. I find the back-door hypothesis regarding Kyber much less plausible than the hypothesis that NSA is once again promoting algorithms whose security levels are low enough for NSA to break.

The New Scientist article correctly quotes me as saying "NIST isn't following procedures designed to stop NSA from weakening PQC" and as saying "People choosing cryptographic standards should be transparently and verifiably following clear public rules so that we don't need to worry about their motivations. NIST promised transparency and then claimed it had shown all its work, but that claim simply isn't true".

The story is based on this blog post, by famous cryptographer Daniel J. Bernstein (also known as djb),

That blog post has title "The inability to count correctly" and subtitle "Debunking NIST's calculation of the Kyber-512 security level".

In that post, I went through the context and, in particular, did say a lot about NTRU: I gave one example after another of NIST manipulating its evaluations to favor the Kyber submission over the NTRU submission. For example, NIST counted memory-access costs for Kyber-512 and not for NTRU-509; on this basis, NIST kept Kyber-512 while discarding NTRU-509. NTRU-509 has considerably smaller keys and ciphertexts than Kyber-512. This isn't a mystery in which NIST somehow ended up choosing Kyber over NTRU; public documents show the manipulation.

The connection between the Kyber-vs.-NTRU comparison and Kyber-512 is that large parts of NIST's Kyber-beats-NTRU narrative collapse without Kyber-512. Consider, for example, NIST's claim that NTRU has "somewhat larger public keys and ciphertexts" than Kyber. This claim is obviously wrong if Kyber-512 is thrown away: NTRU-677 has considerably smaller keys and ciphertexts than Kyber-768.

who happens to be one of the authors of NTRU.

No.

Dr. Schmieg's text here invites the reader to conclude that, as an NTRU author, I started criticizing NSA and NIST to retaliate for NIST choosing Kyber over NTRU. This is disproven by a long list of easily verifiable facts.

I am, first of all, not an NTRU author. NTRU was published by Hoffstein, Pipher, and Silverman in 1998.

Back then, I was successfully suing NSA for one of its more obvious cryptographic-sabotage efforts, namely its unconstitutional use of export controls to censor cryptographic software. Not all cryptographic software: NSA had set up special exceptions for low-security cryptography, creating security problems that remained broadly exploitable decades later.

Earlier in the 1990s, I had organized a letter-writing campaign that stopped NIST's plan to assign the NSA/NIST DSA patent to a private company. The government ended up being forced to give the patent away for free.

I'm on record objecting to many other NSA actions, many other NIST actions, and many other patents. Here's a quote from my 2005 paper "Cache-timing attacks on AES": "So what went wrong? Answer: NIST failed to recognize that table lookups do not take constant time. 'Table lookup: not vulnerable to timing attacks,' NIST stated in [19, Section 3.6.2]. NIST’s statement was, and is, incorrect." As another example, Tanja Lange and I wrote a paper in 2016 whose title was "Failures in NIST's ECC standards". There are many more examples.

Of course there were people saying things like "You're just criticizing the NIST/NSA curves to promote your own curve". But, no, the history shows that, years before the introduction of Curve25519, I wrote software for one of the NIST/NSA curves. I gave a series of three talks about various implementation issues I had encountered for that curve, and mentioned some ways that those issues would be resolved by better curve choices.

Back to NTRU. Many of the encryption systems submitted to NIST's post-quantum competition were lattice systems. Most of those reused critical ideas from NTRU: most importantly, the idea that it's hard to recover small elements a,e of (for example) the ring (Z/q)[x]/(x^n-1) given large G and aG+e. Some people will tell you that this is a "Ring-LWE" problem introduced in the 21st century, but it's actually the message-recovery problem for NTRU studied in Section 3.4.2 of the 1998 NTRU paper.

Three of the lattice-encryption submissions—including one that I coauthored—chose names to give credit to NTRU:

The NTRUEncrypt and NTRU-HRSS submissions merged into a submission somewhat confusingly named "NTRU". NTRU Prime is different: for example, NTRU Prime avoids "cyclotomics", and NTRU Prime provides equal support for a "Quotient NTRU" option and a "Product NTRU" option. The original 1998 NTRU, NTRUEncrypt, and NTRU-HRSS use cyclotomics and Quotient NTRU. Kyber uses cyclotomics and Product NTRU, although obviously Kyber's name isn't crediting NTRU. (See Section 8 of "Comparing proofs of security for lattice-based encryption" for a direct comparison of Quotient NTRU and Product NTRU using unified DH-like notation.)

NIST downgraded NTRU Prime in 2020. I was already on record before that objecting to NIST's non-transparent handling of the post-quantum competition: "After NIST's Dual EC standard was revealed in 2013 to be an actual (rather than just potential) NSA back door, NIST promised more transparency. Why does NIST keep soliciting private #NISTPQC input? (The submissions I'm involved in seem well positioned; that's not the point.)"

After that, NIST's lattice-encryption decision was explicitly between just three submissions: Kyber, NTRU, and SABER. ("As CRYSTALS-KYBER, NTRU, and SABER are all structured lattice schemes, NIST intends to select, at most, one of these finalists to be standardized.") The fundamental reason that NIST should have picked the NTRU submission is that Product NTRU, including Kyber and SABER, is a patent minefield, while Quotient NTRU, including the NTRU submission, is not.

NIST announced a deadline of October 2021 for input. Shortly after the deadline, NIST should have told everyone to roll out NTRU. Deployment would have rapidly picked up steam.

Instead NIST ended up telling everyone to wait until 2024, when NIST's patent license activates, and then roll out Kyber. That's years of user data given away to attackers to decrypt with future quantum computers.

When the rules for the competition say that security is job #1, how can NIST make a decision that does such glaring damage to security?

We don't even know if Kyber will be patent-free in 2024. For example, Yunlei Zhao wrote "Kyber is covered by our patents (not only the two patents mentioned in the KCL proposal, but also more patent afterforwards)". Zhao's first two patents were filed a month before the publication of "NewHope without reconciliation", Kyber's direct predecessor. Maybe there's some reason that Zhao is wrong and that these patents don't actually cover Kyber, but then it's important to have a public analysis convincingly saying why; I haven't seen such an analysis. NIST has never commented on Zhao's patents, or on the other potentially applicable patents, or on the obvious security damage from delaying rollout.

Of course the same sort of blindness that leads to claims such as "You're just criticizing the NIST/NSA curves to promote your own curve" also leads to claims such as "You're just criticizing Kyber to promote your NTRU submission". If the accusation is taken as axiomatic, then working backwards from the accusation says what the facts have to be: see, nobody could possibly be laying out detailed objections to NIST's choice of Kyber over NTRU without being an NTRU submitter. This blindness is so powerful that it stops people from taking ten seconds to check the list of NTRU submitters.

I think most of my readers aren't the sort of people who insist on seeing everything through a cui-bono lens. But maybe you, dear reader, are an exception, in which case I have a suggestion for you. Read about the history of frequently racist abuses of surveillance, and about how opponents of surveillance were themselves targeted. Then go talk to people today who desperately need confidentiality, such as journalists reporting news from whistleblowers. Listen to what they have to say. And then ask yourself who benefits when public evaluation of the merits of cryptographic systems is replaced with personal attacks.

I gather that there were earlier comments on Dr. Schmieg's "happens to be one of the authors of NTRU" claim. She then changed her blog post to say "who happens to be one of the authors of the NTRU variant NTRU-prime, which was also competing in the NIST competition".

If "NTRU" here is referring to the NTRU submission as in my blog post, then, no, describing NTRU Prime as a variant of that is misrepresenting the history. If "NTRU" here is instead referring to the original 1998 NTRU, then this is disconnected from the "choose Kyber over NTRU" context; Kyber and the NTRU submission are also variants of the original 1998 NTRU.

Either way, Dr. Schmieg's revised text insinuates that choosing the NTRU submission over Kyber would benefit NTRU Prime. How exactly is that supposed to work, Dr. Schmieg?

Back in the real world, my sole benefit from NIST doing its job in 2021 and promptly selecting NTRU would have been the happiness of seeing fast rollout. Hopefully that rollout would have stopped future quantum attacks.

I was similarly happy to see Google deploying New Hope back in 2016. I was very unhappy to see patent issues appearing, and Google aborting the deployment. I'm happy to see Google's more recent work on deploying NTRU and Kyber; I hope the Kyber deployment doesn't end up turning into another patent disaster. I'm happy to see TinySSH and OpenSSH deploying NTRU Prime, and various VPNs deploying Classic McEliece. In the case of NTRU Prime and Classic McEliece, sure, those deployments are also good for my CV since I'm a coauthor; but that simply doesn't apply to New Hope, Kyber, or NTRU.

This has confused a bunch of people leading to some asking for a writeup and explanation of what happened.

What precisely is the confusion supposed to be?

I tried to procrastinate until the problem went away, but people keep asking, so here we go.

What precisely is "the problem" referring to here?

Summary
Kyber512 is not less secure than required for NIST security level 1.

As noted above, this claim is unjustified and, given current knowledge of attack performance, indefensible, even if it's understood as referring only to published attacks.

In an attack model that significantly favors symmetric schemes over asymmetric schemes, there is a small discrepancy of less than 6 bits, or a factor 64 that Kyber512 appears weaker than AES-128.

No. This is misrepresenting the literature: it's giving a point estimate where the literature has a very wide uncertainty range.

The attack model here is called "gates" in the round-3 Kyber submission. That submission says that the number of "gates" to break Kyber-512 is somewhere between 2135 and 2165. This is the 30-bit uncertainty interval that I mentioned above. For comparison, NIST estimated 2143 for AES-128.

That submission was in 2020. Subsequent improvements include, e.g., a 2022 paper reporting an order-of-magnitude speedup from tweaking the "BKZ" layer inside attacks, and clumping, which chops almost 10 bits out of the Kyber-512 "gate" counts (and far fewer bits out of AES-128 "gate" counts). These are just two examples of what's going on.

The uncertainty range for "gates" in the Kyber documentation hasn't been updated to account for newer papers. Is the low end now in the 120s? 110s? 100s? If NIST stays quiet enough about the uncertainties and developments then people end up believing and claiming that the "gate" counts are just a few bits below 143.

When changing the model to be more realistic, the problem goes away.

This time "the problem" seems to be referring to the appearance of Kyber-512 being easier to break than AES-128 by published attacks. The statement here is that Kyber-512 appears to be easier to break than AES-128 when one counts "gates", but no longer appears to be easier to break than AES-128 when one accounts for the cost of memory access.

People starting with the idea that security level measured by "gates" is specifically 137, compared to 143 for AES-128, can easily believe that memory access will make up the difference, and that in any case 137 isn't small enough to really be an issue, so it's not important to quantify the memory-access costs. But, again, the starting idea here is misrepresenting the literature.

NIST 2022 rationale says "Taking Matzov's estimates of the attack cost to be accurate, only 6 bits of security from memory access costs are required for Kyber512 to meet category 1". That's the main point in the blog post that I'm responding to. But NIST's rationale goes on to "acknowledge there is some additional uncertainty in the exact complexity of the MATZOV attack (and all other sieving-based lattice attacks)", and to consider "the most paranoid values for these known-unknowns (16 bits of security loss)". This uncertainty regarding the attack cost is invisible in the blog post I'm responding to.

NIST did not make basic mathematical errors in their analysis.

NIST did make a basic mathematical error in its analysis. Specifically, NIST incorrectly multiplied the "real cost of memory access" by the total number of bit operations in an attack, rather than merely by the number of memory accesses in the attack.

The way that NIST wrote this calculation ("40 bits of security more than would be suggested by the RAM model ... approximately 40 bits of additional security quoted as the 'real cost of memory access' by the NTRUprime submission") is explicitly multiplying the "real cost of memory access" (estimated as 240) by a number in "the RAM model". Seeing that this is an error by NIST requires looking at a lot of context (e.g., to see that "the RAM model" is counting all bit operations rather than just memory accesses, and to see that this is not something the NTRU Prime submission said), but my previous posts went through the necessary work of publicly tracing this.

None of the claimed defenses for NIST's calculation have even quoted the NIST text at issue, let alone tried to justify its content.

One of the pernicious side effects of NIST's miscalculation is that it makes it sound as if, once one has figured out the "real cost of memory access", tallying the total memory-access costs in an attack reduces to counting bit operations. The reality is that bit operations are a poor predictor of memory-access costs. My first blog post explained that "a change of cost metric also forces reoptimization of the entire stack of attack subroutines, along with all applicable parameters". My second blog post goes through clumping as a simple example of quantifying the disconnect between metrics.

Defining Security Levels
There are two pieces to this discussion, and to understand what’s at going on, we need to first talk about security levels.
Level 1: The applied definition: A scheme has a security level of n bits, if it is as hard to break as an ideal block cipher with an n bit key on a hypothetical computer that can evaluate the block cipher once every cycle. In other words, security level n means an adversary has to use at least 2^n cycles to break the scheme.

How are the costs of attacking Kyber supposed to be evaluated if there's no statement of what this hypothetical computer can do in a cycle beyond evaluating an "ideal block cipher"? For example, can the computer carry out single-cycle additions, subtractions, multiplications, and divisions, at which point it can run Shamir's linear-time factorization algorithm?

This is a very useful definition, and it’s usually what we think of when talking about security levels.

Doesn't look useful to me. Also, I doubt that this is what most people are thinking.

You can use it to make some back of the envelope calculations, by sticking in some tact frequencies and core numbers or maybe some energy cost and solar outputs, to convince yourself that 128 bit ciphers are probably good enough in many scenarios.

Certainly 2128 calls to AES-128 would be very expensive with current technology, roughly a billion times harder than the 2111 bit operations carried out by Bitcoin last year. It's important to note, however, that (1) many ways of using AES-128 allow attacks that break one out of 240 ciphertexts with just 288 calls to AES-128; (2) computer technology is continuing to improve; and (3) the risks of a cryptosystem being much weaker than AES-128 are on top of the risks of AES-128 not being strong enough.

It is also obviously deeply flawed: Ideal block ciphers don’t exist, no computer can evaluate a full AES block encryption in a single cycle, and no adversary would use a Turing complete, general CPU to brute force, when special purpose hardware would be far more powerful, not wasting time and energy on the control flow of a problem that doesn’t really need such a thing.

The definition is flawed, certainly.

Enter the gate model:
Level 2: NIST’s definition (paraphrased) A scheme has a security level of n bits if it is as hard to break as AES-n

It's reasonable to say "as hard to break as AES-n", specifically for n=128, as a summary of NIST's minimum allowed security level.

However, there's an important extra element in how the minimum security level is described in NIST's call for submissions: "Here, computational resources may be measured using a variety of different metrics (e.g., number of classical elementary operations, quantum circuit size, etc.). In order for a cryptosystem to satisfy one of the above security requirements, any attack must require computational resources comparable to or greater than the stated threshold, with respect to all metrics that NIST deems to be potentially relevant to practical security."

There are many different ways to measure attack cost. Within that, there are supposed to be some metrics that NIST "deems to be potentially relevant to practical security". A cryptosystem is then required to be at least as hard to break as AES-128 in each of those metrics.

What metrics is NIST now using for comparing Kyber-512 to AES-128? I asked some simple yes/no questions, starting with "Is there a definition of the NIST metrics?"; NIST responded with an ad-hominem attack and refused to answer the questions.

NIST’s actual definition is a bit more complicated, in that it also mentions SHA-2 in various levels, but since the discussion is only about NIST’s level 1, i.e. at least as hard as AES-128, we don’t really need to get into that.

Right. AES-128 is critical here: it's the minimum allowed security level in the NIST competition, the "floor" for "category 1" in the call for submissions.

At first, this definition doesn’t seem to be all that different from the first one, only replacing the ideal block cipher with AES. But this change makes a huge difference, since now we do not need a hypothetical computer, but can actually concretely compute the gate cost of the brute forcing attack. Which is what people did. The main paper people are arguing about is this one by Matzov.

There are many recent advances in lattice attacks. The Matzov paper is just one of those. By the way, that link is to v1 of the Matzov paper; there's also a v2.

It comes with a convenient table: ... 137.5 is smaller than 143, so Kyber512 falls 5.5 bits, or a factor of ~50 short of what NIST required. This isn’t a lot.

It's correct that the Matzov paper estimated 2137.5 "gates" for a Kyber-512 attack, not far below NIST's estimated 2143 "gates" for an AES-128 attack.

But remember that the round-3 Kyber submission had said that, because of "known unknowns", the number of "gates" to break Kyber-512 was somewhere between 2135 and 2165. Matzov wasn't magically eliminating the uncertainties; it was taking a point estimate in the middle of this range and estimating that the Matzov speedups were moving that estimate down by more than 10 bits. Furthermore, Section 8 of the Matzov paper points out further speedup ideas that need analysis, extending the list of "known unknowns".

Taking 137 from the "convenient" Matzov table, without reporting the uncertainties, is again misrepresenting the literature. This error is critical for the blog post that I'm responding to. The way the blog post uses the 137 (see below) is by arguing that memory-access costs don't have to contribute many bits to reach 143; this argument collapses when one recognizes that there's actually a very wide uncertainty interval for the starting number, including numbers far below 137. The uncertainty interval in the Kyber documentation hasn't been updated to account for attack speedups, but remember that it was already including numbers below 137 even without those speedups.

It's also important to note that many errors have been discovered in analyses of lattice attacks. Unsurprisingly, there's a followup paper disputing an important point in the Matzov analysis. On the other hand, that paper doesn't claim there's zero speedup, and doesn't claim any particular intermediate number; also, further followups have suggested algorithm tweaks to avoid the issue. For references and further comments, see the section on the "chaos beyond Core-SVP" in my first blog post.

Data locality
Now is the time to introduce the second concept I hinted at above: Data locality. If you have any experience with distributed computing, or even just micro optimizations, you have likely run into this: Even though on paper algorithm A should be faster than algorithm B, in practice algorithm B wins almost all of the time.

That often happens, yes, but it's important to note that sometimes real algorithm performance is in line with predictions.

A linear scan outperforms a tree map, which in turn outperforms a hash map, for example. There are usually two reasons for such a behavior: constant overhead, and data locality. I will focus here on the latter, although the former has also been cited as a counterargument to the claimed attack. The reason a linear scan can outperform a tree or a hash on small to medium size lookups is due to the CPU being able to stream in data from the RAM for the scan, making it so the data is available to be compared constantly, and we don’t have to waste cycles waiting for data to arrive from the RAM. One good way of visualizing this wait time is to calculate that at 3 GHz, data traveling at the speed of light will only make it 10 cm (~62 micromiles) per cycle. It doesn’t matter how fast your RAM is, if you cannot stream your data, you will be wasting several cycles for data to arrive, which in the case of a tree map or a hash collision can add up quickly.

Memory-access time can be very important, definitely, depending on the algorithm.

Of course this effect will not save the linear scan forever, and if you add enough elements, the asymptotic complexity will take over, but note that this is not merely a constant term effect either, as the required space of a circuit grows when scaling up. How we access data matters, and there are tons of tradeoffs to be had, for example whether to force your circuit to synchronize to allow for things like streaming data, but also forces you onto the same cycle based machine we wanted to avoid for extra efficiency above, etc. Since the length of wires and the cost having a synchronization signal travel through our circuit is not accounted for in the gate model, it makes sense that we might need to adjust values a bit for that, even more so when we start distributing algorithms over multiple machines, where these effects start to dominate in almost all practically relevant computations, as in it is usually more important where your data is than how many compute nodes you have to process it.

No, not "almost all practically relevant computations".

The topic at hand is cryptanalysis. The attack targets are structured mathematical problems, not Internet searches. Some cryptanalytic algorithms build large databases internally, but people focusing on memory-access costs for one important cryptanalytic algorithm after another have found optimizations that take advantage of the problem structure and turn memory-access costs into a very small, often unnoticeable, effect.

A classic pre-quantum example is the baby-step-giant-step algorithm for discrete logs. That algorithm has terrible slowdowns from memory-access costs, but those slowdowns are eliminated by Pollard's rho algorithm. van Oorschot and Wiener, using Rivest's idea of "distinguished points", showed how to parallelize Pollard's rho algorithm across many processors, again with negligible communication costs.

As another pre-quantum example, RSA Labs claimed that memory-access costs were a gigantic obstacle to scaling up integer factorization. I found optimizations that reduced the asymptotic NFS exponent on a realistic two-dimensional mesh to about 1.976; this isn't quite as small as the exponent when communication costs are ignored (about 1.902), but it's within 4%.

Now let’s quickly look at the data dependencies of our brute force algorithm. Quickly because there simply aren’t any. Brute force computations are lazily parallel, you simply divide the search space up arithmetically, and have completely independent brute forcing units happily brute forcing over their assigned chunk of work, until one of them is successful.

That's the simple approach, yes. However, in the common situation of a brute-force attack applied to multiple targets at once, people use more sophisticated algorithms to get rid of communication costs.

In other words, the gate model will overestimate the cost of a brute force attack compared to any attack that requires data dependencies. And, as it turns out, lattice reduction attacks have data dependencies. So much so that parallelizing lattice reduction on a single machine is non-trivial, and parallelizing it over multiple machines is an area of active research.

No. There are many different lattice attacks. Some of them, notably "enumeration", fit in low memory and very easily parallelize. A 2020 paper estimates that quantum enumeration outperforms "sieving" below "a crossover rank of k = 547". On the other hand, as pointed out in the NTRU Prime documentation, the cutoff between sieving and enumeration is highly sensitive to uncertainties in the underlying numbers.

As for sieving, there are many interesting opportunities for tradeoffs that reduce memory-access costs without much overhead. For example, a long time ago, Lange and I suggested an idea that's now called "tuple lattice sieving" to combine sieving with enumeration. The name "tuple lattice sieving" comes from a 2016 paper studying the idea; some followups have obtained better results, and I'm sure that continued research in this direction will be productive.

Unfortunately, for most of the post-quantum competition, NIST discouraged research into these low-memory options:

At the end of 2022, NIST suddenly switched from "classical gates" to accounting for memory-access costs, and on this basis claimed that Kyber-512 is "unlikely" to be easier to break than AES-128. How is a cryptanalyst supposed to write a paper disproving this, when NIST refuses to say what metric it's now using?

Lots of numbers have been thrown around to adjust gate counts for these estimates,

Well, yes. For example, there's NIST's miscalculation, which multiplies "gate" counts by a 240 estimate for the "real cost of memory access", while ignoring the facts that (1) the number of memory accesses is far below the number of "gates" and (2) there are many options available for further reducing the number of memory accesses.

None of this is addressed in the blog post I'm responding to. The blog post selects a point estimate from the literature, something not far below 2143 "gates", and concludes that the small gap settles the matter, with no need to quantify memory-access costs. But taking a point estimate, and ignoring uncertainty intervals, is misrepresenting the literature.

but it seems fairly reasonable that a factor of 50 is well within the margin of error here.

See above.

In the end the difference in Matzov’s paper is small enough, that NIST’s confidence in Kyber512 still meeting NIST’s definition being well placed, and accepted by all but one of the cryptographers I know of.

The level of certainty claimed here (1) goes beyond NIST's "unlikely" claim (which in turn rests on a calculation error) and (2) has no justification beyond the underlying misrepresentation of the literature. The argumentum ad populum doesn't deserve a response.

In particular, NIST’s analysis is very much careful, grounded in research,

This assessment is based on misrepresenting the literature.

and does not contain basic math errors.

The blog post I'm responding to repeatedly asserts that there wasn't a math error. However, it doesn't quote the NIST text at issue ("40 bits of security more than would be suggested by the RAM model ... approximately 40 bits of additional security quoted as the 'real cost of memory access' by the NTRUprime submission"), and doesn't address the content of that text.

Closing thoughts
Lastly, my personal opinion on the technical question at stake here (I don’t care much for the personal attacks and polemic that has been thrown around).

A blog post starts with a factually incorrect insinuation of a personal conflict of interest and then complains about personal attacks. Fascinating.

In my opinion, when you can afford it, you should be using Kyber768 instead of Kyber512.

What exactly is the evidence of people not being able to afford Kyber-768? Or Kyber-1024?

This is not due to any currently existing attacks, and I do not think NIST should change their planned standardizations and recommendations, but purely a precautionary step, and in situations where Kyber512’s performance and overhead is required, it is a solid choice.

See above.

My reasoning to prefer Kyber768 comes from the fact that there are two types of attacks it might be facing:

The former family of attacks is likely to occur, and likely to chip small amounts of the security level of Kyber,

FrodoKEM, which is frequently claimed to be the most conservative lattice submission, says it's an "instantiation and implementation" of a 2010 paper from Lindner and Peikert. That paper proposed lattice dimension 256, saying this was as hard to break as AES-128. That wasn't an unreasonable assessment of the lattice attacks published at that point.

Today the situation is very different. It's not at all clear that Kyber-512 is as hard to break as AES-128. It would be absurd to propose dimension 256.

This change is the cumulative effect of many papers. Each of those papers, in isolation, can be described as chipping a small amount off the security level, but the combined impact is dramatic.

Given the likelihood of further attacks, isn't it obviously important to quantify the number of bits of Kyber security margin and to assess the risk of attacks overcoming that number?

without fundamentally threatening the soundness of the algorithm itself.

What exactly does this "soundness" mean, and what exactly does it deliver to Kyber-512 users if their ciphertexts are broken?

This is what we have seen happening to e.g. RSA in the case of the general number field sieve, where small improvements over time pushed RSA-1024 within reach of modern hardware to factor. Lattice reduction similarly has a tuning space

This analogy relies on a lack of quantification ("small", "tuning", etc.). Quantification shows that NFS, while certainly much less stable than ECC attacks, is much more stable than lattice attacks.

For example, the aforementioned asymptotic exponent around 1.902 for breaking pre-quantum RSA was established in 1993 (along with an asymptotic exponent around 1.639 for the cost per key for breaking many RSA keys at once). For lattice sieving, the current asymptotic exponent is from 2015; asymptotic exponents in 2010 were almost 50% higher. For lattice enumeration, the current asymptotic exponent is from 2020; asymptotic exponents in 2019 were almost 50% higher.

and while the approach is unlikely to break Kyber, it might reduce the security level of Kyber512 below what I’m comfortable with.

Quantitatively, what does "unlikely to break" mean here? Where are NIST's statements claiming that Kyber-512 won't drop below security level X, for some clearly defined value of X that's beyond what's feasible for attackers?

The second type of attack is far scarier, as increasing the security level slightly is not guaranteed to help much here, and I hope that such an attack does not exist.

"Far scarier" is unevaluatable in the absence of quantification of the Kyber-512 risk.

I agree that it's reasonable to model an increase of sizes as reducing risks more for the first type of attack than for the second type of attack.

While it is possible to construct a Kyber640 variant relatively easily,

No. Kyber's definition, including features specifically advertised in the Kyber documentation as "unique advantages of Kyber" regarding "ease of implementation" and "scalability", forces the dimension to be a multiple of 256. This stops Kyber from having anything as small as NTRU-677 if Kyber-512 is eliminated. Trying to eliminate this disadvantage by labeling a new cryptosystem as Kyber-640 would mean throwing away Kyber's advertised "unique advantages".

in my experience, Kyber768 is pretty much always acceptable in practice, much simpler, and has a far more comfortable security margin.

Certainly Kyber-768 has a more comfortable security margin than Kyber-512.

The same consideration holds for NTRU, which faces the exact same attacks, and I would aim for NIST level 3 there as well.

Certainly taking higher security levels for the NTRU submission increases comfort, for the same reason that taking higher security levels for the Kyber submission increases comfort. The fastest published attack algorithms are basically the same for both submissions.

But "faces the exact same attacks" makes it sound as if Kyber and NTRU have the same attack surface, i.e., the same space of potential attacks. That's not true.

This is nicely illustrated by an October 2023 incident in which NSA publicly recommended changing how Kyber was using hash functions to generate Kyber's public matrix. Mike Hamburg responded: "I’m not completely sure of it, but the particular instantiation you suggested might, uh, actually contain a weakness. ... if the input to Hashgen ... were really ... then (if I’m reading right) a disaster would occur: Hashgen increments its state, and if I understand correctly it’s big-endian, so ... colliding with another call to hashgen and likely wrecking the security of the entire KEM."

It's easy to write down changes to the matrix generation that destroy all security. These collisions show that NSA's proposal is at least moving in that direction. (Whether it actually gets there in one step is a different question, and one that doesn't matter here.)

Now here's the scary part. If you look at Kyber's "security proofs" and ask which step is violated by NSA's proposal, then the only answer is that the proofs require the hash function to be a "random oracle", while NSA's proposal isn't a "random oracle". But the hash function that Kyber chose also isn't a "random oracle"! If you have an attack that works for all hash functions, "random oracle" proofs might say something about the cost of that attack; but if you have an attack against a specific hash function, "random oracle" proofs don't say anything.

A careful security reviewer has to investigate Kyber's choice of hash function used to generate the matrix, and has to ask whether attacks exploiting this choice can save time. These are "non-QROM 2" attacks in the taxonomy of "Risks of lattice KEMs".

Quotient NTRU, as in the NTRU submission, is structurally immune to these attacks. More broadly, the taxonomy from "Risks of lattice KEMs" identifies four known attack avenues against Kyber that aren't present in NTRU, and one known attack avenue against NTRU that isn't present in Kyber.

As a side note, Damien Stehlé's invited talk at Asiacrypt 2022 falsely characterized a particular theorem as saying "if you can break Ring-LWE-based schemes, you can also break NTRU-based schemes". The hard way to see that this is false is to look at the theorem statement and observe that, no, it doesn't say this; see Sections 5.1 and 5.2 of "Risks of lattice KEMs". The easy way is to observe that, if there were such a theorem, then applying that theorem to a weak-matrix Ring-LWE system, which is something easy to write down, would give an efficient attack against NTRU, which is something nobody has written down.

Kyber scales quadratically in time, which is favorable to the cubic scaling behavior of elliptic curves, and makes bumping up the security level less painful.

Actually, Kyber starts running into serious problems with decryption failures after dimension 1024. The NTRU submission doesn't have this problem: it already provides higher-security options such as NTRU-1229, and its scaling is essentially linear.

Lastly, a few thoughts on comparing MLWE (Kyber) vs. NTRU more broadly. From my perspective as a mathematician, MLWE is the more beautiful problem.

Kyber has complications and security dangers not present in NTRU, and vice versa. For example, there's no proof that QROM IND-CCA2 for Kyber is as hard to break as the MLWE problem inside Kyber (the best available proofs allow a large gap in security levels), whereas NTRU has a much tighter security connection to its internal PKE. As another example, see the October 2023 NSA matrix incident described above.

I find it concerning to see praise for one component of Kyber, specifically the MLWE component, which, again, could be stronger than Kyber is. Thorough security review has to check the whole package.

I'm also puzzled by the content of the praise. The most directly comparable component of NTRU that isn't in Kyber is a homogeneous slice of an MLWE problem, and I don't see why a mathematician would describe solving inhomogeneous linear equations (find small a,e given G and given aG+e) as a more beautiful problem than solving homogeneous linear equations (find small nonzero a,e given G and given aG+e=0). Meanwhile the form of MLWE needed for Kyber has further complications such as extra "samples". (See Section 6.2 of the NTRU Prime submission.) Is the comparison here actually between an abstract MLWE component and the full NTRU system?

This is a fairly useless observation, as mathematical beauty is fairly subjective, and it is not clear whether the more beautiful problem is more secure, but I cannot deny that I find the rather clunky feeling nature of NTRU somewhat offputting.

There are many virtues of simplicity, but if claims about simplicity are playing a role in cryptographic selection then those claims need to be clearly stated and publicly reviewed.

From my probably more relevant perspective as an applied cryptographer, MLWE is easier to implement

Kyber has complications for the implementor that don't appear in NTRU, and vice versa. [2023.12.23 update: See my recent paper "Analyzing the complexity of reference post-quantum software" for quantification of these complications.]

and promises greater performance compared to NTRU at a very comparable security level.

Kyber beats NTRU at some security levels but loses at others. See my first blog post for numbers and graphs.

While these might seem like “nice to haves”, these considerations tend to matter greatly in practice, and are often what makes or breaks a cryptographic algorithm.

People certainly care about performance, and about ease of implementation, and about staying away from patents.

The easier the algorithm is to implement correctly, the less I have to worry about side channel attacks bypassing the entire cryptosystem.

There are easy-to-use tools to check constant-time behavior, such as TIMECOP inside SUPERCOP. The official NTRU software passes TIMECOP. The official Kyber software doesn't, although this wouldn't be difficult to fix.

Most of these tools, including TIMECOP, don't check for divisions, which are variable-time on most CPUs, and multiplications, which are variable-time on some CPUs. (A tool that does check for these is saferewrite.) I would expect people not familiar with these issues to use variable-time divisions for Kyber as often as for NTRU, although compilers will sometimes replace divisions with multiplications. On CPUs where multiplications are variable-time, fixing Kyber is considerably more annoying than fixing NTRU, since Kyber really needs multiplications with both inputs being full size while NTRU doesn't.

[2023.12.23 update: It turns out that a particular variable-time division appears in most Kyber libraries, specifically for handling a component of Kyber that doesn't appear in NTRU. This doesn't prove that Kyber has more traps for the implementor than NTRU, but I see no justification for the idea that it has fewer traps.]

In the end, attackers rarely try to brute force keys,

What does "rarely" mean here, and what's the evidence for this claim?

For comparison, an internal 2007 NSA document says "since the middle of the last century, automation has been used as a way to greatly ease the making and breaking of codes ... Since it possesses the world's largest supercomputing facility at its headquarters, NSA therefore has a great demand for microchips".

but quite frequently exploit much more common problems in implementations.

What does "quite frequently" mean here, and what's the evidence for this claim?

It's interesting to note that Matthew Green has repeatedly made the opposite claim. See my Turbo Boost blog post for references and further comments.

Anyway, we do know that implementation problems are frequently exploitable. Rather than trying to guess which of the exploitable problems are actually being exploited, we should systematically protect everything. This means ensuring mathematical security, and ensuring implementation security, and ensuring security for processes of selecting cryptographic standards.

Performance and overhead are similarly important to the security of the overall system: If algorithms are too slow, they often end up being underused and put into more fragile, but less expensive protocol constructions.

Sometimes happens, yes.

Any analysis that is based on a holistic threat model of the entire system will have to take these considerations into account,

Certainly, but it's important to get the details right.

and from that perspective alone Kyber, including Kyber512, are the better algorithm choice.

No. For example, NTRU-509 has smaller keys and ciphertexts than Kyber-512. If Kyber-512 is supposed to have such a solid security margin that we don't even have to bother quantifying it, then why not use NTRU-509?

The right answer is that the risk of NTRU-509 and Kyber-512 being easier to break than AES-128 is high enough that both of them should be eliminated. But then Kyber structurally offers only two security levels, namely Kyber-768 and Kyber-1024, and again has trouble competing with NTRU at other security levels.

There's ample evidence that the costs of NTRU and Kyber are dominated by communication costs. NTRU has smaller keys and ciphertexts than Kyber at the lowest security levels and at various intermediate security levels. Kyber isn't even trying to compete with NTRU's highest-security options. There are some security levels where Kyber wins, but those don't justify flat claims that Kyber is more efficient than NTRU.

NIST has learned this lesson the hard way, with SHA3, despite being the better construction, struggling to gain on SHA2, due to having a far too conservative security margin and a CPU architecture unfriendly implementation.

How is this supposed to be analogous to NTRU and Kyber? Yes, performance often factors into cryptographic decisions (even when it doesn't actually matter), but this favors NTRU in more ways than Kyber.


Version: This is version 2023.12.23 of the 20231125-kyber.html web page.