Newton-Raphson Method

The Newton-Raphson method (often simply called Newton’s method) is a powerful optimization technique that uses both first and second derivatives to find the minimum or maximum of a function. It is named after Isaac Newton and Joseph Raphson.

Principle

Newton’s method is based on using a quadratic approximation of the objective function at each iteration. For a function $L (θ)$ , the quadratic approximation around the current point $θ_{k}$ is:

$L (θ) \approx L (θ_{k}) + \nabla L (θ_{k})^{T} (θ - θ_{k}) + \frac{1}{2} (θ - θ_{k})^{T} H (θ_{k}) (θ - θ_{k})$

where $\nabla L$ is the gradient and $H$ is the Hessian matrix.

Algorithm

Univariate Case (D=1)

For a function $L (θ)$ with scalar variable $θ$ :

Start with an initial guess $θ_{0}$
At each iteration $k$ :
- Compute the first derivative $L^{'} (θ_{k})$
- Compute the second derivative $L^{''} (θ_{k})$
- Update: $θ_{k + 1} = θ_{k} - \frac{L ^{'} ( θ _{k} )}{L ^{''} ( θ _{k} )}$
Stop when $∣ L^{'} (θ_{k}) ∣ < ε$ or $∣ θ_{k + 1} - θ_{k} ∣ < ε$

The geometric interpretation is that we find the minimum of the parabola that approximates the function at the current point.

Multivariate Case (D≥2)

For a function $L (θ)$ with vector variable $θ \in R^{D}$ :

Start with an initial guess $θ_{0}$
At each iteration $k$ :
- Compute the gradient $g = \nabla L (θ_{k})$
- Compute the Hessian $H = \nabla^{2} L (θ_{k})$
- Solve the linear system: $H Δ θ = - g$
- Update: $θ_{k + 1} = θ_{k} + Δ θ$
Stop when $∣∣ g ∣∣ < ε$ or $∣∣Δ θ ∣∣ < ε$

This can also be written as: $θ_{k + 1} = θ_{k} - H^{- 1} g$ , though directly computing the inverse is generally avoided in practice due to numerical considerations.

Derivation

The derivation comes from setting the derivative of the quadratic approximation to zero:

$\nabla L (θ) \approx \nabla L (θ_{k}) + H (θ_{k}) (θ - θ_{k}) = 0$

Solving for $θ$ gives: $θ = θ_{k} - H (θ_{k})^{- 1} \nabla L (θ_{k})$

This becomes the update rule for the next iteration.

Convergence Properties

Newton’s method has excellent convergence properties when started sufficiently close to a solution:

Quadratic convergence: The error approximately squares at each iteration
Typically requires fewer iterations than gradient-based methods
Invariant to linear transformations of parameters (unlike gradient descent)

However, convergence is only guaranteed under certain conditions:

The Hessian must be positive definite at each iteration (for minimization)
The initial point must be sufficiently close to a local minimum

Modified Newton Methods

To address potential issues with the basic Newton method:

Line Search

Introduce a step size parameter $α$ : $θ_{k + 1} = θ_{k} - α H^{- 1} g$

The step size $α$ is chosen to ensure:

Descent property: $L (θ_{k + 1}) < L (θ_{k})$
Appropriate step length via Wolfe conditions or Armijo rule

Hessian Modification

When the Hessian is not positive definite:

Add a positive diagonal matrix: $H + λ I$ (becomes Levenberg-Marquardt method)
Perform eigenvalue modification to ensure positive definiteness
Use trust region approaches

Advantages

Rapid convergence near the solution
Scale-invariant (unlike gradient descent)
Typically requires fewer iterations than first-order methods
Particularly effective for quadratic or nearly quadratic functions

Disadvantages

Requires computation of second derivatives
Expensive for large-scale problems (O(D³) for the linear solve)
May diverge if started far from the solution
Not guaranteed to converge for non-convex functions

Example: Newton’s Method for a Quadratic Function

For a quadratic function $L (θ) = \frac{1}{2} θ^{T} Q θ + b^{T} θ + c$ :

Gradient: $\nabla L (θ) = Q θ + b$
Hessian: $H = Q$ (constant)

Newton’s method becomes: $θ_{k + 1} = θ_{k} - Q^{- 1} (Q θ_{k} + b) = - Q^{- 1} b$

For a positive definite $Q$ , Newton’s method finds the exact minimum in a single step!

Implementation Considerations

For large-scale problems, avoid explicit Hessian computation and inversion
Use iterative methods to solve the linear system $H Δ θ = - g$
Consider quasi-Newton methods that approximate the Hessian
Use careful line search to ensure convergence
Monitor the conditioning of the Hessian

Quartz 4

Explorer

Newton-Raphson Method

Principle

Algorithm

Univariate Case (D=1)

Multivariate Case (D≥2)

Derivation

Convergence Properties

Modified Newton Methods

Line Search

Hessian Modification

Advantages

Disadvantages

Example: Newton’s Method for a Quadratic Function

Implementation Considerations

Graph View

Table of Contents

Backlinks

Quartz 4

Explorer

Newton-Raphson Method

Principle

Algorithm

Univariate Case (D=1)

Multivariate Case (D≥2)

Derivation

Convergence Properties

Modified Newton Methods

Line Search

Hessian Modification

Advantages

Disadvantages

Example: Newton’s Method for a Quadratic Function

Implementation Considerations

Related Pages

Graph View

Table of Contents

Backlinks