Author avatar

Vivek Kumar

Learning to Optimize with SciPy

Vivek Kumar

  • May 9, 2019
  • 9 Min read
  • 63 Views
  • May 9, 2019
  • 9 Min read
  • 63 Views
Data
SciPy

Introduction

In the previous guide in this series, you were introduced to the SciPy stats module to perform statistical analysis in Python. In this guide, you will learn about various optimization algorithms available in SciPy.

By the end of this guide you'll have a strong understanding of the following topics:

  1. Minimize method
  2. Global optimization
  3. Least-squares minimization
  4. Root finding

The Baseline

Throughout this guide, we will be using the following libraries:

1
2
3
4
5
import numpy as np
import scipy as sp
from scipy import optimize
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
python

Minimize Method

The minimize function provides a common interface to unconstrained and constrained minimization algorithms for multivariate scalar functions in scipy.optimize. Let us take the Rosenbrock function to demonstrate the minimization function on N variables.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
# ============================
# Minimizing the Rosenbrock Function
# ============================
def rosenbrock(x):    
    return sum(100.0*(x[1:]-x[:-1]**2.0)**2.0 + (1-x[:-1])**2.0)

x0 = np.array([1.3, 0.7, 1.8, 2.9, 1.8])
res = optimize.minimize(rosenbrock, x0, method='nelder-mead',
               options={'xtol': 1e-8, 'disp': True})

# Optimization terminated successfully.
#          Current function value: 0.000000
#          Iterations: 527
#          Function evaluations: 867

res.x

# array([1.        , 1.        , 1.        , 1.        , 0.99999999])
python

The current method doesn't use the concept of gradient evaluation and, hence, it may take longer sometimes to find the minimum.

In order to converge quickly with less number of iterations, a gradient evaluation routine can be set up as shown in the code below:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
# ============================
# Function to compute gradient
# ============================
def rosen_grad(x):
    xm = x[1:-1]
    xm_m1 = x[:-2]
    xm_p1 = x[2:]
    der = np.zeros_like(x)
    der[1:-1] = 200*(xm-xm_m1**2) - 400*(xm_p1 - xm**2)*xm - 2*(1-xm)
    der[0] = -400*x[0]*(x[1]-x[0]**2) - 2*(1-x[0])
    der[-1] = 200*(x[-1]-x[-2]**2)
    return der

# The gradient information is specified in the minimize
# method and passed to the jac argument as shown below

res = optimize.minimize(rosenbrock, x0, method='BFGS', jac=rosen_grad,
               options={'disp': True})

# Optimization terminated successfully.
#          Current function value: 0.000000
#          Iterations: 40
#          Function evaluations: 47
#          Gradient evaluations: 47

res.x
# array([1.        , 1.00000002, 1.00000004, 1.00000007, 1.00000014])
python

Global Optimization

Global optimization aims to find the global minimum of a function within given bounds, in the presence of potentially many local minima. Typically global minimizers efficiently search the parameter space, while using a local minimizer (e.g. minimize) under the hood.

Let us create a data to implement the global optimization:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
# Creating data with many local and global minima
def peaks(x):
    return (-(x[1] + 47) * np.sin(np.sqrt(abs(x[0]/2 + (x[1]  + 47))))
            -x[0] * np.sin(np.sqrt(abs(x[0] - (x[1]  + 47)))))

bounds = [(-512, 512), (-512, 512)]

# Visualizing the data
x = np.arange(-512, 513)
y = np.arange(-512, 513)
xgrid, ygrid = np.meshgrid(x, y)
xy = np.stack([xgrid, ygrid])

fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.view_init(45, -45)
ax.plot_surface(xgrid, ygrid, peaks(xy), cmap='terrain')
ax.set_xlabel('x')
ax.set_ylabel('y')
ax.set_zlabel('Peaks(x, y)')
plt.show()
python

Peaks

We will now use the global optimizers to obtain the minimum and the function value at the minimum.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
# Storing results in a dictionary
results = dict()

# Various Global Optimizers
results['SHGO'] = optimize.shgo(peaks, bounds)
results['SHGO_SOBOL'] = optimize.shgo(peaks, bounds, n=200, iters=5, sampling_method='sobol')
results['DA'] = optimize.dual_annealing(peaks, bounds)
results['DE'] = optimize.differential_evolution(peaks, bounds)
results['BH'] = optimize.basinhopping(peaks, bounds)

# Visualizing the minima for each optimizer
fig = plt.figure()
ax = fig.add_subplot(111)
im = ax.imshow(peaks(xy), interpolation='bilinear', origin='lower',
               cmap='gray')
ax.set_xlabel('x')
ax.set_ylabel('y')

def plot_point(res, marker='o', color=None):
    ax.plot(512+res.x[0], 512+res.x[1], marker=marker, color=color, ms=10)

plot_point(results['BH'], color='y')  # basinhopping           - yellow
plot_point(results['DE'], color='c')  # differential_evolution - cyan
plot_point(results['DA'], color='w')  # dual_annealing.        - white

# SHGO produces multiple minima, plot them all (with a smaller marker size)
plot_point(results['SHGO'], color='r', marker='+')
plot_point(results['SHGO_SOBOL'], color='r', marker='x')
for i in range(results['SHGO_SOBOL'].xl.shape[0]):
    ax.plot(512 + results['SHGO_SOBOL'].xl[i, 0],
            512 + results['SHGO_SOBOL'].xl[i, 1],
            'ro', ms=2)

ax.set_xlim([-4, 514*2])
ax.set_ylim([-4, 514*2])
plt.show()
python

min

Least-squares Minimization

One of the main applications of nonlinear least squares is nonlinear regression or curve fitting. Let us take an model example to illustrate this behaviour:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
# Defining data generator
def generate_data(t, A, sigma, omega, noise=0, n_outliers=0, random_state=0):
    y = A * np.exp(-sigma * t) * np.sin(omega * t)
    rnd = np.random.RandomState(random_state)
    error = noise * rnd.randn(t.size)
    outliers = rnd.randint(0, t.size, n_outliers)
    error[outliers] *= 35
    return y + error

# Model parameters
A = 2
sigma = 0.1
omega = 0.1 * 2 * np.pi
x_true = np.array([A, sigma, omega])
noise = 0.1
t_min = 0
t_max = 30

# Outliers
t_train = np.linspace(t_min, t_max, 30)
y_train = generate_data(t_train, A, sigma, omega, noise=noise, n_outliers=4)

# Function to compute residuals of least-squares
def fun(x, t, y):
    return x[0] * np.exp(-x[1] * t) * np.sin(x[2] * t) - y

# Initial estimate
x0 = np.ones(3)

# Running standard least-squares
res_lsq = optimize.least_squares(fun, x0, args=(t_train, y_train))

# Running robust least-squares
res_robust = optimize.least_squares(fun, x0, loss='soft_l1', f_scale=0.1, args=(t_train, y_train))

# Visualization
t_test = np.linspace(t_min, t_max, 300)
y_test = generate_data(t_test, A, sigma, omega)
y_lsq = generate_data(t_test, *res_lsq.x)
y_robust = generate_data(t_test, *res_robust.x)
plt.figure(figsize=(8, 5))
plt.plot(t_train, y_train, 'o', label='data')
plt.plot(t_test, y_test, label='true')
plt.plot(t_test, y_lsq, label='lsq')
plt.plot(t_test, y_robust, label='robust lsq')
plt.xlabel('$t$')
plt.ylabel('$y$')
plt.legend()
plt.show()
python

plt

Root Finding

If you have a single-variable equation, there are multiple different root finding algorithms that can be tried. Most of these algorithms require the endpoints of an interval in which a root is expected (because the function changes signs). In general, brentq is the best choice, but the other methods may be useful in certain circumstances or for academic purposes. When a bracket is not available but one or more derivatives are available, then newton (or halley, secant) may be applicable. This is especially the case if the function is defined on a subset of the complex plane and the bracketing methods cannot be used.

Let us take a single variable equation and try to arrive at its roots:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
# Defining a single variable equation
# x + 2cos(x)
def func(x):
    return x + 2 * np.cos(x)

# Finding root
sol = optimize.root(func, 0.3)

#     fjac: array([[-1.]])
#      fun: array([-6.66133815e-16])
#  message: 'The solution converged.'
#     nfev: 10
#      qtf: array([-1.20746968e-09])
#        r: array([-2.71445911])
#   status: 1
#  success: True
#        x: array([-1.02986653])
            
sol.x, sol.fun
# (array([-1.02986653]), array([-6.66133815e-16]))
python

Conclusion

In this guide, you have learned about various optimization algorithms and their implementation using the scipy.optimize subpackage.

To learn more about SciPy, you can refer to the following guides:

0