Numerical Methods

Numerical methods are essential for solving optimization problems that cannot be solved analytically. These methods provide algorithmic approaches to find optimal solutions through iterative procedures.

Motivation

Most real-world optimization problems don’t have closed-form solutions due to:

Complex, nonlinear objective functions
Multiple variables with intricate relationships
Constraints that complicate the search space
Large-scale problems with thousands or millions of variables

Numerical methods address these challenges by using iterative approaches to converge to an optimal or near-optimal solution.

General Structure of Numerical Methods

Most numerical optimization methods follow this general framework:

Initialization: Start with an initial guess $θ_{0}$
Iteration: At each step $k$ :
- Evaluate the objective function $L (θ_{k})$ (and possibly derivatives)
- Determine a search direction $p_{k}$
- Determine a step size $α_{k}$
- Update: $θ_{k + 1} = θ_{k} + α_{k} p_{k}$
Termination: Stop when convergence criteria are satisfied

The primary differences between methods lie in how they determine the search direction and step size.

Classification of Numerical Methods

By Information Used

Zeroth-Order Methods (Derivative-Free)
- Use only function evaluations $L (θ)$
- Examples: Grid search, random search, Nelder-Mead simplex
- Advantages: No need for derivatives, robust for non-smooth functions
- Disadvantages: Typically slow convergence, inefficient for high-dimensional problems
First-Order Methods
- Use function evaluations and gradients $\nabla L (θ)$
- Examples: Gradient descent, conjugate gradient, stochastic gradient descent
- Advantages: Moderate convergence speed, reasonable computational cost
- Disadvantages: May struggle with ill-conditioned problems
Second-Order Methods
- Use function evaluations, gradients, and Hessians $\nabla^{2} L (θ)$
- Examples: Newton’s method, quasi-Newton methods (BFGS, L-BFGS)
- Advantages: Rapid convergence, scale-invariant
- Disadvantages: Higher computational cost per iteration

By Search Strategy

Line Search Methods
- Determine a direction, then find appropriate step size along that direction
- Examples: Steepest descent with backtracking line search, Newton’s method with line search
- Key components: Direction selection and step size determination
Trust Region Methods
- Build a model of the function within a “trusted” region, find the optimum in that region
- Examples: Trust region Newton method, Levenberg-Marquardt algorithm
- Key components: Region size adjustment and model accuracy evaluation
Direct Search Methods
- Search without explicitly using derivatives
- Examples: Pattern search, simplex methods, genetic algorithms
- Key components: Sampling strategy and pattern adaptation

Key Numerical Methods

Gradient Descent

The most fundamental first-order method, which follows the negative gradient direction:

$θ_{k + 1} = θ_{k} - α_{k} \nabla L (θ_{k})$

The step size $α_{k}$ can be:

Fixed (simple but often inefficient)
Determined by line search (Armijo, Wolfe conditions)
Schedule-based (e.g., $α_{k} = \frac{α _{0}}{1 + β k}$ )

Convergence: Linear convergence rate for smooth, convex functions with Lipschitz continuous gradients.

Newton’s Method

A powerful second-order method that uses the Hessian to determine both direction and step size:

$θ_{k + 1} = θ_{k} - [\nabla^{2} L (θ_{k})]^{- 1} \nabla L (θ_{k})$

Convergence: Quadratic convergence near the solution for smooth functions with positive definite Hessian.

Quasi-Newton Methods

Methods that approximate the Hessian or its inverse to avoid the computational cost of computing second derivatives:

BFGS (Broyden-Fletcher-Goldfarb-Shanno): Updates an approximation of the inverse Hessian using gradient information
L-BFGS (Limited-memory BFGS): Stores only a limited history of updates for large-scale problems

Convergence: Superlinear convergence rate, between linear and quadratic.

Conjugate Gradient Method

An efficient first-order method that uses conjugate directions to improve convergence:

Set initial direction $p_{0} = - \nabla L (θ_{0})$
At each iteration $k$ :
- Determine step size $α_{k}$ via line search
- Update $θ_{k + 1} = θ_{k} + α_{k} p_{k}$
- Compute new gradient $g_{k + 1} = \nabla L (θ_{k + 1})$
- Update direction: $p_{k + 1} = - g_{k + 1} + β_{k} p_{k}$
- Typical choice for $β_{k}$ : $\frac{g _{k + 1}^{T} g _{k + 1}}{g _{k}^{T} g _{k}}$ (Fletcher-Reeves)

Convergence: For quadratic objectives, converges in at most $n$ steps (where $n$ is the dimension).

Convergence Analysis

Convergence Rates

Sublinear: $∣ L (θ_{k}) - L (θ^{*}) ∣ \leq \frac{C}{k ^{p}}$ (p > 0)
Linear: $∣ L (θ_{k + 1}) - L (θ^{*}) ∣ \leq r ∣ L (θ_{k}) - L (θ^{*}) ∣$ (0 < r < 1)
Superlinear: $lim_{k \to \infty} \frac{∣ L ( θ _{k + 1} ) - L ( θ ^{*} ) ∣}{∣ L ( θ _{k} ) - L ( θ ^{*} ) ∣} = 0$
Quadratic: $∣ L (θ_{k + 1}) - L (θ^{*}) ∣ \leq C ∣ L (θ_{k}) - L (θ^{*}) ∣^{2}$

Where $θ^{*}$ is the optimal solution and $C$ is a constant.

Convergence Criteria

Common stopping conditions include:

Gradient norm below threshold: $∣\nabla L (θ_{k}) ∣ < ϵ_{1}$
Parameter change below threshold: $∣ θ_{k} - θ_{k - 1} ∣ < ϵ_{2}$
Function value change below threshold: $∣ L (θ_{k}) - L (θ_{k - 1}) ∣ < ϵ_{3}$
Maximum iterations reached: $k > k_{m a x}$

Implementation Challenges

Numerical Stability

Ill-conditioning: When the condition number of the Hessian is large
Scaling: Different variables having different magnitudes
Mitigation: Parameter scaling, preconditioning, regularization

Computational Efficiency

Memory usage: Storing matrices for large-scale problems
Computational complexity:
- Function evaluation: Problem-dependent
- Gradient evaluation: $O (n)$ with automatic differentiation
- Hessian evaluation: $O (n^{2})$ at best
- Matrix operations: Up to $O (n^{3})$ for naive implementations
Strategies: Sparse matrices, iterative linear solvers, parallelization

Selecting the Appropriate Method

The choice of numerical method depends on several factors:

Problem characteristics:
- Size (number of variables)
- Smoothness and continuity
- Convexity
- Conditioning
Available information:
- Analytical gradients available or need numerical approximation
- Ability to compute or approximate Hessian
Computational resources:
- Memory constraints
- Time constraints
- Parallel computing capability
Accuracy requirements:
- High precision needed or approximate solution sufficient
- Global or local optimum required

Software Tools

Many software packages implement these numerical methods:

Python: SciPy, PyTorch, TensorFlow, JAX
MATLAB: Optimization Toolbox, fminunc, fmincon
R: optim, nlminb
C/C++: ALGLIB, NLopt, dlib
Specialized: IPOPT, SNOPT, CPLEX, Gurobi

Quartz 4

Explorer

Numerical Methods

Motivation

General Structure of Numerical Methods

Classification of Numerical Methods

By Information Used

By Search Strategy

Key Numerical Methods

Gradient Descent

Newton’s Method

Quasi-Newton Methods

Conjugate Gradient Method

Convergence Analysis

Convergence Rates

Convergence Criteria

Implementation Challenges

Numerical Stability

Computational Efficiency

Selecting the Appropriate Method

Software Tools

Graph View

Table of Contents

Backlinks

Quartz 4

Explorer

Numerical Methods

Motivation

General Structure of Numerical Methods

Classification of Numerical Methods

By Information Used

By Search Strategy

Key Numerical Methods

Gradient Descent

Newton’s Method

Quasi-Newton Methods

Conjugate Gradient Method

Convergence Analysis

Convergence Rates

Convergence Criteria

Implementation Challenges

Numerical Stability

Computational Efficiency

Selecting the Appropriate Method

Software Tools

Related Pages

Graph View

Table of Contents

Backlinks