Fast prime factorization module

Question 1

Fast prime factorization module

python algorithm prime-factoring

orlp · Jan 10, 2011 · Viewed 60.7k times · Source

Answer

Answer

If you don't want to reinvent the wheel, use the library sympy

pip install sympy

Use the function sympy.ntheory.factorint

>>> from sympy.ntheory import factorint
>>> factorint(10**20+1)
{73: 1, 5964848081: 1, 1676321: 1, 137: 1}

You can factor some very large numbers:

>>> factorint(10**100+1)
{401: 1, 5964848081: 1, 1676321: 1, 1601: 1, 1201: 1, 137: 1, 73: 1, 129694419029057750551385771184564274499075700947656757821537291527196801: 1}

Question 2

I am looking for an implementation or clear algorithm for getting the prime factorization of N in either python, pseudocode or anything else well-readable. There are a few demands/facts:

N is between 1 and ~20 digits
No pre-calculated lookup table, memoization is fine though.
Need not to be mathematically proven (e.g. could rely on the Goldbach conjecture if needed)
Need not to be precise, is allowed to be probabilistic/deterministic if needed

I need a fast prime factorization algorithm, not only for itself, but for usage in many other algorithms like calculating the Euler phi(n).

I have tried other algorithms from Wikipedia and such but either I couldn't understand them (ECM) or I couldn't create a working implementation from the algorithm (Pollard-Brent).

I am really interested in the Pollard-Brent algorithm, so any more information/implementations on it would be really nice.

Thanks!

EDIT

After messing around a little I have created a pretty fast prime/factorization module. It combines an optimized trial division algorithm, the Pollard-Brent algorithm, a miller-rabin primality test and the fastest primesieve I found on the internet. gcd is a regular Euclid's GCD implementation (binary Euclid's GCD is much slower then the regular one).

Bounty

Oh joy, a bounty can be acquired! But how can I win it?

Find an optimalization or bug in my module.
Provide alternative/better algorithms/implementations.

The answer which is the most complete/constructive gets the bounty.

And finally the module itself:

import random

def primesbelow(N):
    # http://stackoverflow.com/questions/2068372/fastest-way-to-list-all-primes-below-n-in-python/3035188#3035188
    #""" Input N>=6, Returns a list of primes, 2 <= p < N """
    correction = N % 6 > 1
    N = {0:N, 1:N-1, 2:N+4, 3:N+3, 4:N+2, 5:N+1}[N%6]
    sieve = [True] * (N // 3)
    sieve[0] = False
    for i in range(int(N ** .5) // 3 + 1):
        if sieve[i]:
            k = (3 * i + 1) | 1
            sieve[k*k // 3::2*k] = [False] * ((N//6 - (k*k)//6 - 1)//k + 1)
            sieve[(k*k + 4*k - 2*k*(i%2)) // 3::2*k] = [False] * ((N // 6 - (k*k + 4*k - 2*k*(i%2))//6 - 1) // k + 1)
    return [2, 3] + [(3 * i + 1) | 1 for i in range(1, N//3 - correction) if sieve[i]]

smallprimeset = set(primesbelow(100000))
_smallprimeset = 100000
def isprime(n, precision=7):
    # http://en.wikipedia.org/wiki/Miller-Rabin_primality_test#Algorithm_and_running_time
    if n < 1:
        raise ValueError("Out of bounds, first argument must be > 0")
    elif n <= 3:
        return n >= 2
    elif n % 2 == 0:
        return False
    elif n < _smallprimeset:
        return n in smallprimeset


    d = n - 1
    s = 0
    while d % 2 == 0:
        d //= 2
        s += 1

    for repeat in range(precision):
        a = random.randrange(2, n - 2)
        x = pow(a, d, n)

        if x == 1 or x == n - 1: continue

        for r in range(s - 1):
            x = pow(x, 2, n)
            if x == 1: return False
            if x == n - 1: break
        else: return False

    return True

# https://comeoncodeon.wordpress.com/2010/09/18/pollard-rho-brent-integer-factorization/
def pollard_brent(n):
    if n % 2 == 0: return 2
    if n % 3 == 0: return 3

    y, c, m = random.randint(1, n-1), random.randint(1, n-1), random.randint(1, n-1)
    g, r, q = 1, 1, 1
    while g == 1:
        x = y
        for i in range(r):
            y = (pow(y, 2, n) + c) % n

        k = 0
        while k < r and g==1:
            ys = y
            for i in range(min(m, r-k)):
                y = (pow(y, 2, n) + c) % n
                q = q * abs(x-y) % n
            g = gcd(q, n)
            k += m
        r *= 2
    if g == n:
        while True:
            ys = (pow(ys, 2, n) + c) % n
            g = gcd(abs(x - ys), n)
            if g > 1:
                break

    return g

smallprimes = primesbelow(1000) # might seem low, but 1000*1000 = 1000000, so this will fully factor every composite < 1000000
def primefactors(n, sort=False):
    factors = []

    for checker in smallprimes:
        while n % checker == 0:
            factors.append(checker)
            n //= checker
        if checker > n: break

    if n < 2: return factors

    while n > 1:
        if isprime(n):
            factors.append(n)
            break
        factor = pollard_brent(n) # trial division did not fully factor, switch to pollard-brent
        factors.extend(primefactors(factor)) # recurse to factor the not necessarily prime factor returned by pollard-brent
        n //= factor

    if sort: factors.sort()

    return factors

def factorization(n):
    factors = {}
    for p1 in primefactors(n):
        try:
            factors[p1] += 1
        except KeyError:
            factors[p1] = 1
    return factors

totients = {}
def totient(n):
    if n == 0: return 1

    try: return totients[n]
    except KeyError: pass

    tot = 1
    for p, exp in factorization(n).items():
        tot *= (p - 1)  *  p ** (exp - 1)

    totients[n] = tot
    return tot

def gcd(a, b):
    if a == b: return a
    while b > 0: a, b = b, a % b
    return a

def lcm(a, b):
    return abs((a // gcd(a, b)) * b)

Fast prime factorization module

Bounty

Answer

Related questions