Hi Matúš, Marek, and Petr,
Thanks for the feedback, and in particular pointing out the limited
range of primes shown in your 2016 paper. We still think this counts as
operating from public information. :-)
Here are some speedups to the "divisors in residue classes" calculation.
This Sage script is now slightly faster than your announced Haswell
speed: typically 5% to 25% faster, depending on N. We used Sage 8.0
compiled under Ubuntu 16.04. Sage 8.0 uses fplll 5.1.0 for LLL, and uses
MPIR 3.0.0 for integer arithmetic.
One of the speedups implemented here is "chaining". This means taking
the LLL-reduced matrix for powers of uw+Hx and multiplying by a matrix
of binomial coefficients (or actually the square). This corresponds to
replacing u with u+vH (or actually u+2vH), moving from the interval
[u-vH,u+vH] to a subsequent interval. Applying LLL to the product of
matrices is very fast: the product is almost LLL-reduced already.
Taking larger H reduces the number of intervals, saving time. Of course
if H grows too large then the algorithm might fail. If the "gap"
calculation in this script is correct then "guaranteed True" in the
output means that H is guaranteed to work for this N. Experimentally,
taking slightly larger H is still quite reliable.
The dependence on N arises because comparing N to the limited range of
primes usually implies a more limited range. The graphs in your 2016
USENIX paper say that the primes are usually in an even narrower
interval; focusing on factoring those N's would allow further speedups.
As one expects for this type of algorithm, the matrix columns are known
to be multiples of 1,H,H^2,... Arithmetic would be considerably faster
if these were divided out. This would require an API tweak to the LLL
library to specify these weights, and a minor tweak inside the library.
At a lower level, the integer arithmetic is not optimized to the extent
demonstrated in papers on, e.g., fast ECC. Further speedups are clearly
possible. As a starting point, simply replacing MPIR with GMP might save
some time (or lose some time---we haven't benchmarked these).
The values of "smooth" and "k" below are obtained from a search through
many possibilities, taking account of the experimentally observed cost
of LLL etc., so the values should be readjusted if the cost changes.
We didn't implement the outer search through powers of 65537 modulo v.
To guarantee checking only 50% of the powers in the worst case, it's
important to first compute the (easy) discrete log of N modulo v and
then search exponents between floor(log/2) and ceil((log+order)/2). The
"avgyearsx4" output by the script is for running through all powers, so
the worst case is 50% of that and the average case is 25%.
---Dan and Tanja
from sage.doctest.util import Timer
ttry = Timer()
t = Timer()
L = 27771430913146044712156219115012732149015337058745243774375474371978395728107173008782747458575903820497344261101333156469136833289328084229401057505005215261077328417649807720533310592783171487952296983742789708502518237023426083874832018749447215424764928016413509553872836856095214672430
L *= 701 # if 701 is included
g = Mod(65537,L)
pmin = 12*2^1020
pmax = 13*2^1020
proof.arithmetic(False) # do not bother proving primality
t.start()
u = lift(g^randrange(L))
while True:
p = u + randrange(ceil(pmin/L),floor(pmax/L)) * L
if p.is_prime(): break
print 'time for first prime',t.stop().cputime
t.start()
u = lift(g^randrange(L))
while True:
q = u + randrange(ceil(pmin/L),floor(pmax/L)) * L
if q.is_prime(): break
print 'time for second prime',t.stop().cputime
n = p * q
print 'public key',n
smooth,k,H = 41902660800,5,7*2^461
m = 2*k+1
print 'smooth',smooth
print 'multiplicity',k
print 'lattice rank',m
print 'Hbits',H.nbits()
print 'H',H
def lg(x): return log(1.0 * x) / log(2.0)
lggamma = lg(m)/(2.0*k) + lg(2*H)*(m-1.0)/(2.0*k) + lg(n)*((k+1.0)/(2.0*m)-1)
gap = lg(pmin) - lg(n) - lggamma
print 'gap',gap
print 'guaranteed',gap > 0
def smoothorder(l):
return smooth % Mod(g,l).multiplicative_order() == 0
v = prod(l for l,e in factor(L) if smoothorder(l))
print 'v',v
u = p % v
print 'p residue class',u
ttry.start()
pmin = max(pmin,ceil(n / pmax))
pmax = min(pmax,floor(n / pmin))
pradius = (pmax - pmin) // 2
print 'n-dependent pradius bits',pradius.nbits()
R. = ZZ[]
u += floor((pmin + H * v - u) / v) * v
# now pmin-v < u-H*v <= pmin
t.start()
wu = lift(u/Mod(v,n))
f = wu+H*x
fpowers = [f^0]
for j in range(k): fpowers += [fpowers[j] * f]
print 'fpowers time',t.stop().cputime,'bits',[fi.nbits() for fi in fpowers[k].coefficients(sparse=False)]
X = matrix([n])
for i in range(1,k):
t.start()
X = X.augment(vector(ZZ,i))
X = X.stack(vector(ZZ,fpowers[i].coefficients(sparse=False)))
X = X.LLL(delta=0.3,early_red=True,use_siegel=True)
print 'miniLLL time',t.stop().cputime,'bits',[[Mji.nbits() for Mji in Mi] for Mi in X]
X = n * X
t.start()
for i in range(k,m):
X = X.augment(vector(ZZ,i))
X = X.stack(vector(ZZ,[0]*(i - k) + (H^(i-k)*fpowers[k]).coefficients(sparse=False)))
X = X.LLL(delta=0.6,early_red=True,use_siegel=True)
print 'bigLLL time',t.stop().cputime,'bits',[[Mji.nbits() for Mji in Mi] for Mi in X]
numLLL = 1
shift1 = matrix([[binomial(j,i) for i in range(m)] for j in range(m)])
shift2 = shift1*shift1
t.start()
factored = false
while True:
# search u-Hv,...,u+Hv in steps of v
# i.e. search u+vs with s being -H,...,H
M0 = X[0]
Q0 = sum(ZZ(z/H^i)*x^i for i,z in enumerate(M0))
Qroots = Q0.roots(ring=ZZ)
for r,multiplicity in Qroots:
if u+v*r > 0:
g = gcd(n,u+v*r)
if g > 1 and g < n:
if not factored:
print '----- successful factorization',[g,n/g]
factored = True
# could abort at this point but want to benchmark failure case
u += 2*H*v
if u-H*v > pmax: break
X *= shift2
X = X.LLL(delta=0.6,early_red=True,use_siegel=True)
numLLL += 1
print 'scan time',t.stop().cputime,'numLLL',numLLL
timetry = ttry.stop().cputime
print 'avgyearsx4',timetry * smooth / (86400 * 365.25)