The Gauss-Newton method is a specialized optimization algorithm primarily used for solving nonlinear least squares problems. It’s particularly effective for parameter estimation in curve fitting and model fitting applications, where we need to minimize the sum of squared residuals between observed data and model predictions.

Problem Formulation

Nonlinear Least Squares

Consider a nonlinear function with parameters , and a set of observed data points for . The residuals are:

The nonlinear least squares problem aims to find the parameters that minimize the sum of squared residuals:

where is the residual vector.

Algorithm Description

The Gauss-Newton method is an iterative procedure that approximates the nonlinear least squares problem with a linear least squares problem at each iteration.

Linearization of the Residual

At each iteration , we linearize the residual function around the current parameter estimate :

where is the gradient of the -th residual with respect to , evaluated at .

Jacobian Matrix

The gradients of all residuals form the Jacobian matrix :

For models of the form , the Jacobian components are:

Update Rule

At each iteration, the Gauss-Newton update is:

where both and are evaluated at the current parameter estimate .

The new parameter estimate becomes:

Algorithm Steps

  1. Choose an initial parameter estimate
  2. For iteration until convergence:
    • Compute the residual vector
    • Compute the Jacobian matrix
    • Solve the normal equations:
    • Update:
    • Check convergence criteria
  3. Return the final parameter estimate

Theoretical Justification

Relationship to Newton’s Method

The Gauss-Newton method can be derived as an approximation to Newton’s method for minimizing the least squares objective function:

The gradient of is:

The Hessian of is:

The Gauss-Newton method approximates the Hessian by dropping the second term:

This approximation is accurate when:

  • The residuals are small, or
  • The functions are nearly linear, making small

Interpretation as Linear Least Squares

Each Gauss-Newton iteration solves a linear least squares problem:

This is obtained by linearizing the residuals and substituting into the original objective function.

Convergence Properties

Convergence Rate

  • When the starting point is sufficiently close to the solution and the residuals are small or zero at the solution, Gauss-Newton exhibits quadratic convergence (similar to Newton’s method)
  • With larger residuals, convergence is linear
  • May not converge if the starting point is far from the solution or if the problem is ill-conditioned

Comparison with Other Methods

MethodConvergence RateResidual SizeSecond DerivativesMatrix Inversion
Gauss-NewtonQuadratic/LinearSmall/LargeNot neededRequired
NewtonQuadraticAnyRequiredRequired
Gradient DescentLinearAnyNot neededNot required
Levenberg-MarquardtVariesAnyNot neededRequired

Limitations and Challenges

Singular or Ill-Conditioned Jacobian

If is singular or nearly singular, the update equation becomes numerically unstable. This can occur due to:

  • Overparameterization
  • Highly correlated parameters
  • Lack of sensitivity to certain parameters

Solutions include:

  • Regularization (adding a small value to the diagonal of )
  • Using pseudoinverse instead of inverse
  • Singular value decomposition (SVD)

Divergence

Gauss-Newton may diverge when:

  • The initial guess is far from the solution
  • The objective function has significant nonlinearity
  • The model is poorly specified relative to the data

Large Residuals

When residuals are large at the solution, the Hessian approximation becomes poor, leading to:

  • Slower convergence
  • Potential oscillation or divergence
  • Suboptimal solutions

Enhancements and Variants

Incorporating a line search to determine the step size :

where is chosen to ensure sufficient decrease in the objective function.

Trust Region Methods

Restrict the step size to a region where the linear approximation is trusted:

where is the trust region radius adjusted based on model accuracy.

Levenberg-Marquardt Algorithm

A hybrid of Gauss-Newton and gradient descent:

where is a damping parameter that:

  • When large, makes the algorithm behave like gradient descent (robust but slow)
  • When small, makes the algorithm behave like Gauss-Newton (fast but potentially unstable)

Practical Implementation

Computing the Jacobian

Three main approaches:

  1. Analytical Jacobian: Derive and implement explicit formulas for derivatives
    • Most accurate
    • Potentially complex for complicated models
    • Error-prone for large models
  2. Numerical Approximation: Use finite differences
    • Forward difference:
    • Central difference:
    • Simple to implement but less accurate and more computationally expensive
  3. Automatic Differentiation: Leverage specialized software
    • Combines accuracy of analytical with convenience of numerical methods
    • Modern libraries like TensorFlow, PyTorch, JAX support this

Solving the Normal Equations

Methods to solve :

  1. Cholesky Decomposition: If is positive definite
    • Decompose
    • Solve for
    • Solve for
  2. QR Decomposition: More stable but more expensive
    • Decompose
    • Solve
  3. SVD (Singular Value Decomposition): Most stable but most expensive
    • Decompose
    • Compute

Convergence Criteria

Common stopping conditions:

  1. Small parameter change:
  2. Small residual change:
  3. Small gradient:
  4. Maximum iterations reached:

Applications

Curve Fitting

Fitting parameterized curves to data points:

  • Exponential decay:
  • Power laws:
  • Gaussian peaks:

Model Calibration

Adjusting model parameters to match experimental data:

  • Chemical reaction kinetics
  • Mechanical system parameters
  • Electrical circuit models

System Identification

Determining mathematical models of dynamic systems:

  • Transfer function identification
  • State-space model parameter estimation
  • Time-series modeling

Implementation Example

Python implementation of the Gauss-Newton method for curve fitting:

import numpy as np
from scipy import optimize
 
def gauss_newton(func, x_data, y_data, theta0, jac=None, tol=1e-6, max_iter=100):
    """
    Gauss-Newton algorithm for nonlinear least squares.
    
    Parameters:
    - func: Function that computes the model prediction
    - x_data: Independent variable data
    - y_data: Dependent variable data (observations)
    - theta0: Initial parameter estimate
    - jac: Function to compute the Jacobian (if None, use finite differences)
    - tol: Convergence tolerance
    - max_iter: Maximum number of iterations
    
    Returns:
    - theta: Optimal parameter estimate
    - info: Information about the optimization
    """
    theta = np.asarray(theta0)
    n_params = len(theta)
    n_samples = len(x_data)
    
    # If no Jacobian function is provided, create a numerical approximation
    if jac is None:
        def numerical_jacobian(theta):
            J = np.zeros((n_samples, n_params))
            epsilon = 1e-8  # Small step for finite differences
            
            for j in range(n_params):
                theta_plus = theta.copy()
                theta_plus[j] += epsilon
                
                # Forward difference approximation
                J[:, j] = -(func(x_data, theta_plus) - func(x_data, theta)) / epsilon
            
            return J
        
        jacobian_func = numerical_jacobian
    else:
        jacobian_func = jac
    
    # Gauss-Newton iterations
    for iteration in range(max_iter):
        # Compute residuals
        y_pred = func(x_data, theta)
        residuals = y_data - y_pred
        
        # Compute objective function value
        obj_value = 0.5 * np.sum(residuals**2)
        
        # Compute Jacobian
        J = jacobian_func(theta)
        
        # Compute gradient of objective function
        gradient = -J.T @ residuals
        
        # Compute Gauss-Newton step
        try:
            # Using normal equations: (J^T J) delta_theta = -J^T residuals
            JTJ = J.T @ J
            delta_theta = np.linalg.solve(JTJ, -J.T @ residuals)
        except np.linalg.LinAlgError:
            # If matrix is singular, use pseudoinverse
            delta_theta = -np.linalg.pinv(J) @ residuals
        
        # Update parameters
        theta_new = theta + delta_theta
        
        # Check convergence
        if np.linalg.norm(delta_theta) < tol:
            theta = theta_new
            break
        
        theta = theta_new
    
    # Compute final residuals and objective value
    y_pred = func(x_data, theta)
    residuals = y_data - y_pred
    final_obj_value = 0.5 * np.sum(residuals**2)
    
    info = {
        'iterations': iteration + 1,
        'obj_value': final_obj_value,
        'residuals': residuals,
        'jacobian': jacobian_func(theta)
    }
    
    return theta, info