Numerical methods are essential for solving optimization problems that cannot be solved analytically. These methods provide algorithmic approaches to find optimal solutions through iterative procedures.

Motivation

Most real-world optimization problems don’t have closed-form solutions due to:

  • Complex, nonlinear objective functions
  • Multiple variables with intricate relationships
  • Constraints that complicate the search space
  • Large-scale problems with thousands or millions of variables

Numerical methods address these challenges by using iterative approaches to converge to an optimal or near-optimal solution.

General Structure of Numerical Methods

Most numerical optimization methods follow this general framework:

  1. Initialization: Start with an initial guess
  2. Iteration: At each step :
    • Evaluate the objective function (and possibly derivatives)
    • Determine a search direction
    • Determine a step size
    • Update:
  3. Termination: Stop when convergence criteria are satisfied

The primary differences between methods lie in how they determine the search direction and step size.

Classification of Numerical Methods

By Information Used

  1. Zeroth-Order Methods (Derivative-Free)

    • Use only function evaluations
    • Examples: Grid search, random search, Nelder-Mead simplex
    • Advantages: No need for derivatives, robust for non-smooth functions
    • Disadvantages: Typically slow convergence, inefficient for high-dimensional problems
  2. First-Order Methods

    • Use function evaluations and gradients
    • Examples: Gradient descent, conjugate gradient, stochastic gradient descent
    • Advantages: Moderate convergence speed, reasonable computational cost
    • Disadvantages: May struggle with ill-conditioned problems
  3. Second-Order Methods

    • Use function evaluations, gradients, and Hessians
    • Examples: Newton’s method, quasi-Newton methods (BFGS, L-BFGS)
    • Advantages: Rapid convergence, scale-invariant
    • Disadvantages: Higher computational cost per iteration

By Search Strategy

  1. Line Search Methods

    • Determine a direction, then find appropriate step size along that direction
    • Examples: Steepest descent with backtracking line search, Newton’s method with line search
    • Key components: Direction selection and step size determination
  2. Trust Region Methods

    • Build a model of the function within a “trusted” region, find the optimum in that region
    • Examples: Trust region Newton method, Levenberg-Marquardt algorithm
    • Key components: Region size adjustment and model accuracy evaluation
  3. Direct Search Methods

    • Search without explicitly using derivatives
    • Examples: Pattern search, simplex methods, genetic algorithms
    • Key components: Sampling strategy and pattern adaptation

Key Numerical Methods

Gradient Descent

The most fundamental first-order method, which follows the negative gradient direction:

The step size can be:

  • Fixed (simple but often inefficient)
  • Determined by line search (Armijo, Wolfe conditions)
  • Schedule-based (e.g., )

Convergence: Linear convergence rate for smooth, convex functions with Lipschitz continuous gradients.

Newton’s Method

A powerful second-order method that uses the Hessian to determine both direction and step size:

Convergence: Quadratic convergence near the solution for smooth functions with positive definite Hessian.

Quasi-Newton Methods

Methods that approximate the Hessian or its inverse to avoid the computational cost of computing second derivatives:

  • BFGS (Broyden-Fletcher-Goldfarb-Shanno): Updates an approximation of the inverse Hessian using gradient information

  • L-BFGS (Limited-memory BFGS): Stores only a limited history of updates for large-scale problems

Convergence: Superlinear convergence rate, between linear and quadratic.

Conjugate Gradient Method

An efficient first-order method that uses conjugate directions to improve convergence:

  1. Set initial direction
  2. At each iteration :
    • Determine step size via line search
    • Update
    • Compute new gradient
    • Update direction:
    • Typical choice for : (Fletcher-Reeves)

Convergence: For quadratic objectives, converges in at most steps (where is the dimension).

Convergence Analysis

Convergence Rates

  • Sublinear: (p > 0)
  • Linear: (0 < r < 1)
  • Superlinear:
  • Quadratic:

Where is the optimal solution and is a constant.

Convergence Criteria

Common stopping conditions include:

  • Gradient norm below threshold:
  • Parameter change below threshold:
  • Function value change below threshold:
  • Maximum iterations reached:

Implementation Challenges

Numerical Stability

  • Ill-conditioning: When the condition number of the Hessian is large
  • Scaling: Different variables having different magnitudes
  • Mitigation: Parameter scaling, preconditioning, regularization

Computational Efficiency

  • Memory usage: Storing matrices for large-scale problems
  • Computational complexity:
    • Function evaluation: Problem-dependent
    • Gradient evaluation: with automatic differentiation
    • Hessian evaluation: at best
    • Matrix operations: Up to for naive implementations
  • Strategies: Sparse matrices, iterative linear solvers, parallelization

Selecting the Appropriate Method

The choice of numerical method depends on several factors:

  1. Problem characteristics:
    • Size (number of variables)
    • Smoothness and continuity
    • Convexity
    • Conditioning
  2. Available information:
    • Analytical gradients available or need numerical approximation
    • Ability to compute or approximate Hessian
  3. Computational resources:
    • Memory constraints
    • Time constraints
    • Parallel computing capability
  4. Accuracy requirements:
    • High precision needed or approximate solution sufficient
    • Global or local optimum required

Software Tools

Many software packages implement these numerical methods:

  • Python: SciPy, PyTorch, TensorFlow, JAX
  • MATLAB: Optimization Toolbox, fminunc, fmincon
  • R: optim, nlminb
  • C/C++: ALGLIB, NLopt, dlib
  • Specialized: IPOPT, SNOPT, CPLEX, Gurobi