Levenberg-Marquardt Method

The Levenberg-Marquardt (LM) method is a robust optimization algorithm for solving nonlinear least squares problems. It combines the advantages of gradient descent and the Gauss-Newton method, making it one of the most widely used algorithms for parameter estimation in curve fitting, computer vision, and machine learning.

Problem Formulation

Nonlinear Least Squares Problem

Given a nonlinear model function $f (x; θ)$ with parameters $θ \in R^{n}$ and observed data points $(x_{i}, y_{i})$ for $i = 1, 2, ..., m$ , the residuals are:

$r_{i} (θ) = y_{i} - f (x_{i}; θ)$

The nonlinear least squares problem aims to find the parameters $θ^{*}$ that minimize the sum of squared residuals:

$min_{θ} \sum_{i = 1}^{m} r_{i} (θ)^{2} = min_{θ} ∣ r (θ) ∣^{2}$

where $r (θ) = [r_{1} (θ), r_{2} (θ), ..., r_{m} (θ)]^{T}$ is the residual vector.

The Algorithm

Motivation and Intuition

The Levenberg-Marquardt method addresses the limitations of both:

Gradient Descent: Robust far from the solution but slow near the solution
Gauss-Newton Method: Fast near the solution but can be unstable far from it

LM combines these approaches by adaptively switching between them based on the progress of the optimization.

Update Equation

The core of the LM algorithm is the damped update equation:

$(J^{T} J + λ D) Δ θ = - J^{T} r$

where:

$J$ is the Jacobian matrix of the residuals
$r$ is the residual vector
$λ > 0$ is the damping parameter
$D$ is a scaling matrix, often the identity matrix $I$ or the diagonal of $J^{T} J$
$Δ θ$ is the parameter update

The new parameter estimate becomes:

$θ_{k + 1} = θ_{k} + Δ θ$

$J = \frac{\partial f _{1}}{\partial x _{1}} \frac{\partial f _{2}}{\partial x _{1}} ⋮ \frac{\partial f _{m}}{\partial x _{1}} \frac{\partial f _{1}}{\partial x _{2}} \frac{\partial f _{2}}{\partial x _{2}} ⋮ \frac{\partial f _{m}}{\partial x _{2}} \dots \dots ⋱ \dots \frac{\partial f _{1}}{\partial x _{n}} \frac{\partial f _{2}}{\partial x _{n}} ⋮ \frac{\partial f _{m}}{\partial x _{n}}$

Extra

$Question 7 [15 marks]$

Consider using the Gauss-Newton method for nonlinear least squares problems.

(a) Derive the Gauss-Newton algorithm for solving nonlinear least squares problems of the form: $min_{θ} f (θ) = \frac{1}{2} \sum_{i = 1}^{m} r_{i} (θ)^{2}$ where $r_{i} (θ)$ are the residual functions. Show how this leads to the iterative update formula: $θ_{k + 1} = θ_{k} - (J_{k}^{T} J_{k})^{- 1} J_{k}^{T} r_{k}$ where $J_{k}$ is the Jacobian of the residuals evaluated at $θ_{k}$ and $r_{k}$ is the vector of residuals. [5]

(b) Consider the nonlinear curve fitting problem: $y = θ_{1} e^{θ_{2} x} + θ_{3}$ Given the following data points: $(x_{1}, y_{1}) = (0, 5)$ , $(x_{2}, y_{2}) = (1, 3)$ , $(x_{3}, y_{3}) = (2, 2)$ , $(x_{4}, y_{4}) = (3, 1.5)$

Starting with initial parameters $θ_{0} = [4, - 0.5, 1]^{T}$ , perform one iteration of the Gauss-Newton method. Calculate the residuals, Jacobian, and the parameter update. [7]

(c) Compare the Gauss-Newton method with Newton’s method for optimization. Discuss the advantages and limitations of the Gauss-Newton method, particularly when the residuals are significant at the solution. [3]

$Solution to Question 7 - Gauss-Newton Method$

(a) For a nonlinear least squares problem: $min_{θ} f (θ) = \frac{1}{2} \sum_{i = 1}^{m} r_{i} (θ)^{2} = \frac{1}{2} ∣∣ r (θ) ∣ ∣_{2}^{2}$

The gradient of $f$ is: $\nabla f (θ) = \sum_{i = 1}^{m} r_{i} (θ) \nabla r_{i} (θ) = J (θ)^{T} r (θ)$

where $J (θ)$ is the Jacobian matrix with elements $J_{ij} = \frac{\partial r _{i}}{\partial θ _{j}}$ .

The Hessian of $f$ is: $\nabla^{2} f (θ) = J (θ)^{T} J (θ) + \sum_{i = 1}^{m} r_{i} (θ) \nabla^{2} r_{i} (θ)$

The Gauss-Newton method approximates the Hessian by ignoring the second-term (which involves second derivatives of residuals): $\nabla^{2} f (θ) \approx J (θ)^{T} J (θ)$

Using Newton’s method with this approximation: $θ_{k + 1} = θ_{k} - [\nabla^{2} f (θ_{k})]^{- 1} \nabla f (θ_{k})$ $θ_{k + 1} = θ_{k} - [J_{k}^{T} J_{k}]^{- 1} J_{k}^{T} r_{k}$

This is the Gauss-Newton update formula.

(b) For the model $y = θ_{1} e^{θ_{2} x} + θ_{3}$ , the residuals are: $r_{i} (θ) = y_{i} - (θ_{1} e^{θ_{2} x_{i}} + θ_{3})$

With initial parameters $θ_{0} = [4, - 0.5, 1]^{T}$ , let’s calculate the residuals: $r_{1} = 5 - (4 e^{- 0.5 \cdot 0} + 1) = 5 - (4 + 1) = 0$ $r_{2} = 3 - (4 e^{- 0.5 \cdot 1} + 1) = 3 - (4 \cdot 0.607 + 1) = 3 - 3.428 = - 0.428$ $r_{3} = 2 - (4 e^{- 0.5 \cdot 2} + 1) = 2 - (4 \cdot 0.368 + 1) = 2 - 2.472 = - 0.472$ $r_{4} = 1.5 - (4 e^{- 0.5 \cdot 3} + 1) = 1.5 - (4 \cdot 0.223 + 1) = 1.5 - 1.892 = - 0.392$

So $r_{0} = [0, - 0.428, - 0.472, - 0.392]^{T}$

Now, calculate the Jacobian. The partial derivatives are: $\frac{\partial r _{i}}{\partial θ _{1}} = - e^{θ_{2} x_{i}}$ $\frac{\partial r _{i}}{\partial θ _{2}} = - θ_{1} x_{i} e^{θ_{2} x_{i}}$ $\frac{\partial r _{i}}{\partial θ _{3}} = - 1$

Evaluating at $θ_{0} = [4, - 0.5, 1]^{T}$ : $J_{0} = [- 1 0 - 1 - 0.607 - 0.607 - 1 - 0.368 - 0.736 - 1 - 0.223 - 0.669 - 1]$

Computing $J_{0}^{T} J_{0}$ : $J_{0}^{T} J_{0} = [1.5426 1.5426 2.198 1.5426 2.0461 2.012 2.198 2.012 4]$

Computing $J_{0}^{T} r_{0}$ : $J_{0}^{T} r_{0} = [0.5507 0.9624 1.292]$

Solving the system $(J_{0}^{T} J_{0}) Δ θ = J_{0}^{T} r_{0}$ : $Δ θ \approx [0.3835 0.2174 0.1692]$

Therefore: $θ_{1} = θ_{0} - Δ θ = [4 - 0.5 1] - [0.3835 0.2174 0.1692] = [3.6165 - 0.7174 0.8308]$

Advantages of Gauss-Newton:

Only requires first derivatives (Jacobian) while Newton’s method needs second derivatives (Hessian)
More computationally efficient for least squares problems
Works well when residuals are small at the solution

Limitations:

Can converge slowly or diverge when residuals are large at the solution
The approximation of the Hessian (ignoring second-order terms) becomes poor when residuals are significant
Doesn’t handle rank deficiency in the Jacobian well

When residuals are significant at the solution, the ignored second-order term in the Hessian approximation becomes important. In such cases, modified approaches like the Levenberg-Marquardt algorithm (which adds damping to Gauss-Newton) perform better by improving stability and convergence.

Quartz 4

Explorer