Least Squares

Least squares is a mathematical optimization technique that seeks to find the best fit to a set of data by minimizing the sum of the squared differences between observed values and predicted values. It is one of the most widely used approaches in data fitting and parameter estimation.

Mathematical Formulation

Basic Formulation

Given a set of $m$ observations $(x_{i}, y_{i})$ for $i = 1, 2, \dots, m$ and a model function $f (x; θ)$ parameterized by a vector $θ \in R^{n}$ , the least squares problem is:

$min_{θ} \sum_{i = 1}^{m} [y_{i} - f (x_{i}; θ)]^{2}$

This can be rewritten in vector form by defining the residual vector $r (θ) \in R^{m}$ with components $r_{i} (θ) = y_{i} - f (x_{i}; θ)$ :

$min_{θ} ∣ r (θ) ∣^{2} = min_{θ} r (θ)^{T} r (θ)$

Weighted Least Squares

When observations have different uncertainties, a weighted least squares approach is used:

$min_{θ} \sum_{i = 1}^{m} w_{i} [y_{i} - f (x_{i}; θ)]^{2}$

where $w_{i} > 0$ are weights, typically chosen as $w_{i} = 1/ σ_{i}^{2}$ where $σ_{i}$ is the standard deviation of the $i$ -th observation.

Regularized Least Squares

To prevent overfitting or handle ill-posed problems, regularization terms can be added:

$min_{θ} ∣ r (θ) ∣^{2} + λ ∣ θ ∣^{2}$

This is known as Tikhonov regularization or ridge regression when using the $L_{2}$ norm of $θ$ .

Linear Least Squares

Formulation

In linear least squares, the model function is a linear combination of parameters:

$f (x_{i}; θ) = \sum_{j = 1}^{n} θ_{j} ϕ_{j} (x_{i})$

where $ϕ_{j}$ are basis functions. This can be written in matrix form as:

$y = X θ + e$

where:

$y \in R^{m}$ is the vector of observations
$X \in R^{m \times n}$ is the design matrix with elements $X_{ij} = ϕ_{j} (x_{i})$
$e \in R^{m}$ is the error vector

The objective function becomes:

$min_{θ} ∣ y - X θ ∣^{2}$

Normal Equations

The optimal solution to the linear least squares problem satisfies the normal equations:

$X^{T} X θ = X^{T} y$

If $X^{T} X$ is invertible, the solution is:

$θ^{*} = (X^{T} X)^{- 1} X^{T} y$

This is the closed-form solution to the linear least squares problem.

Numerical Methods for Linear Least Squares

Several numerical methods can be used to solve linear least squares problems, especially for large-scale or ill-conditioned problems:

QR Decomposition:
- Decompose $X = QR$ where $Q$ is orthogonal and $R$ is upper triangular
- Solution: $θ^{*} = R^{- 1} Q^{T} y$
- More stable than direct solution of normal equations
Singular Value Decomposition (SVD):
- Decompose $X = UΣ V^{T}$
- Solution: $θ^{*} = V Σ^{+} U^{T} y$ where $Σ^{+}$ is the pseudoinverse of $Σ$
- Most stable but computationally expensive
- Handles rank-deficient problems
Cholesky Decomposition:
- Decompose $X^{T} X = L L^{T}$ where $L$ is lower triangular
- Solution via forward and backward substitution
- Efficient for positive definite normal equations
Iterative Methods:
- Conjugate Gradient (CG)
- LSQR (for sparse problems)
- Suitable for very large-scale problems

Nonlinear Least Squares

Formulation

When the model function is nonlinear in the parameters, we have a nonlinear least squares problem:

$min_{θ} ∣ r (θ) ∣^{2}$

where $r_{i} (θ) = y_{i} - f (x_{i}; θ)$ and $f$ is nonlinear in $θ$ .

Gauss-Newton Method

The Gauss-Newton method is a specialized algorithm for nonlinear least squares problems:

Start with an initial guess $θ_{0}$
At each iteration $k$ :
- Compute the Jacobian matrix $J_{k}$ with elements $J_{ij} = \frac{\partial r _{i}}{\partial θ _{j}} (θ_{k})$
- Solve the linear system: $J_{k}^{T} J_{k} Δ θ = - J_{k}^{T} r (θ_{k})$
- Update: $θ_{k + 1} = θ_{k} + Δ θ$
Repeat until convergence

This can be viewed as applying Newton’s method to the normal equations, but approximating the Hessian as $H \approx J^{T} J$ .

Levenberg-Marquardt Algorithm

The Levenberg-Marquardt algorithm adds damping to the Gauss-Newton method to improve stability:

$(J_{k}^{T} J_{k} + λ_{k} I) Δ θ = - J_{k}^{T} r (θ_{k})$

where $λ_{k} > 0$ is a damping parameter that adjusts dynamically:

If the step reduces the error, decrease $λ$ (more like Gauss-Newton)
If the step increases the error, increase $λ$ (more like gradient descent)

This can be viewed as a trust region method where the damping parameter controls the size of the trust region.

Applications of Least Squares

Regression Analysis

Least squares is the foundation of regression analysis, used to model relationships between variables:

Linear regression
Polynomial regression
Multiple regression
Generalized linear models

Curve Fitting

Fitting mathematical functions to data points:

Interpolation
Approximation of complex functions
Smoothing noisy data

Parameter Estimation

Estimating parameters in physical models:

System identification in control theory
Estimating chemical reaction rates
Determining material properties

Signal Processing

Processing and analyzing signals:

Filter design
System identification
Spectrum analysis

Geodesy and Surveying

Determining positions and distances:

GPS position estimation
Surveying measurements
Geodetic network adjustments

Statistical Interpretation

Least squares estimation has important statistical interpretations:

Maximum Likelihood Estimation: Under the assumption of independent, identically distributed Gaussian errors, the least squares estimate is equivalent to the maximum likelihood estimate.
Best Linear Unbiased Estimator (BLUE): Under the Gauss-Markov assumptions, the least squares estimator is the best linear unbiased estimator.
Minimum Variance Unbiased Estimator: If the errors are normally distributed, the least squares estimator is the minimum variance unbiased estimator.

Error Analysis and Diagnostics

Several metrics help evaluate the quality of least squares fits:

Residual Analysis:
- Plot residuals to check for patterns
- Test for normality of residuals
- Check for autocorrelation
Goodness of Fit Measures:
- Coefficient of determination ( $R^{2}$ )
- Adjusted $R^{2}$
- Root mean squared error (RMSE)
Leverage and Influence:
- Hat matrix diagonal elements (leverage)
- Cook’s distance (influence)
- DFFITS and DFBETAS

Implementation Considerations

Numerical Stability

Linear least squares problems can become ill-conditioned when:

The columns of $X$ are nearly linearly dependent
There are large differences in scale among variables
The number of parameters is large compared to the number of data points

Solutions include:

Regularization
Using stable decomposition methods (SVD)
Data preprocessing (centering, scaling)

Computational Efficiency

For large-scale problems, considerations include:

Memory requirements for storing matrices
Computational complexity of matrix operations
Specialized algorithms for sparse systems
Iterative methods for very large systems

Quartz 4

Explorer

Least Squares

Mathematical Formulation

Basic Formulation

Weighted Least Squares

Regularized Least Squares

Linear Least Squares

Formulation

Normal Equations

Numerical Methods for Linear Least Squares

Nonlinear Least Squares

Formulation

Gauss-Newton Method

Levenberg-Marquardt Algorithm

Applications of Least Squares

Regression Analysis

Curve Fitting

Parameter Estimation

Signal Processing

Geodesy and Surveying

Statistical Interpretation

Error Analysis and Diagnostics

Implementation Considerations

Numerical Stability

Computational Efficiency

Graph View

Table of Contents

Backlinks

Quartz 4

Explorer

Least Squares

Mathematical Formulation

Basic Formulation

Weighted Least Squares

Regularized Least Squares

Linear Least Squares

Formulation

Normal Equations

Numerical Methods for Linear Least Squares

Nonlinear Least Squares

Formulation

Gauss-Newton Method

Levenberg-Marquardt Algorithm

Applications of Least Squares

Regression Analysis

Curve Fitting

Parameter Estimation

Signal Processing

Geodesy and Surveying

Statistical Interpretation

Error Analysis and Diagnostics

Implementation Considerations

Numerical Stability

Computational Efficiency

Related Pages

Graph View

Table of Contents

Backlinks