I'm writing some moderately performance critical code in numpy. This code will be in the inner most loop, of a computation that's run time is measured in hours. A quick calculation suggest that this code will be executed up something like 10^12 times, in some variations of the calculation.
So the function is to calculate sigmoid(X) and another to calculate its derivative (gradient).
Sigmoid has the property that for
y=sigmoid(x), dy/dx= y(1-y)
In python for numpy this looks like:
sigmoid = vectorize(lambda(x): 1.0/(1.0+exp(-x)))
grad_sigmoid = vectorize(lambda (x): sigmoid(x)*(1-sigmoid(x)))
As can be seen, both functions are pure (without side effects), so they are ideal candidates for memoization, at least for the short term, I have some worries about caching every single call to sigmoid ever made: Storing 10^12 floats which would take several terabytes of RAM.
Is there a good way to optimise this?
Will python pick up that these are pure functions and cache them for me, as appropriate?
Am I worrying over nothing?
These functions already exist in scipy. The sigmoid function is available as scipy.special.expit
.
In [36]: from scipy.special import expit
Compare expit
to the vectorized sigmoid function:
In [38]: x = np.linspace(-6, 6, 1001)
In [39]: %timeit y = sigmoid(x)
100 loops, best of 3: 2.4 ms per loop
In [40]: %timeit y = expit(x)
10000 loops, best of 3: 20.6 µs per loop
expit
is also faster than implementing the formula yourself:
In [41]: %timeit y = 1.0 / (1.0 + np.exp(-x))
10000 loops, best of 3: 27 µs per loop
The CDF of the logistic distribution is the sigmoid function. It is available as the cdf
method of scipy.stats.logistic
, but cdf
eventually calls expit
, so there is no point in using that method. You can use the pdf
method to compute the derivative of the sigmoid function, or the _pdf
method which has less overhead, but "rolling your own" is faster:
In [44]: def sigmoid_grad(x):
....: ex = np.exp(-x)
....: y = ex / (1 + ex)**2
....: return y
Timing (x has length 1001):
In [45]: from scipy.stats import logistic
In [46]: %timeit y = logistic._pdf(x)
10000 loops, best of 3: 73.8 µs per loop
In [47]: %timeit y = sigmoid_grad(x)
10000 loops, best of 3: 29.7 µs per loop
Be careful with your implementation if you are going to use values that are far into the tails. The exponential function can overflow pretty easily. logistic._cdf
is a bit more robust than my quick implementation of sigmoid_grad
:
In [60]: sigmoid_grad(-500)
/home/warren/anaconda/bin/ipython:3: RuntimeWarning: overflow encountered in double_scalars
import sys
Out[60]: 0.0
In [61]: logistic._pdf(-500)
Out[61]: 7.1245764067412855e-218
An implementation using sech**2
(1/cosh**2
) is a bit slower than the above sigmoid_grad
:
In [101]: def sigmoid_grad_sech2(x):
.....: y = (0.5 / np.cosh(0.5*x))**2
.....: return y
.....:
In [102]: %timeit y = sigmoid_grad_sech2(x)
10000 loops, best of 3: 34 µs per loop
But it handles the tails better:
In [103]: sigmoid_grad_sech2(-500)
Out[103]: 7.1245764067412855e-218
In [104]: sigmoid_grad_sech2(500)
Out[104]: 7.1245764067412855e-218