Least squares is a mathematical optimization technique that seeks to find the best fit to a set of data by minimizing the sum of the squared differences between observed values and predicted values. It is one of the most widely used approaches in data fitting and parameter estimation.

Mathematical Formulation

Basic Formulation

Given a set of observations for and a model function parameterized by a vector , the least squares problem is:

This can be rewritten in vector form by defining the residual vector with components :

Weighted Least Squares

When observations have different uncertainties, a weighted least squares approach is used:

where are weights, typically chosen as where is the standard deviation of the -th observation.

Regularized Least Squares

To prevent overfitting or handle ill-posed problems, regularization terms can be added:

This is known as Tikhonov regularization or ridge regression when using the norm of .

Linear Least Squares

Formulation

In linear least squares, the model function is a linear combination of parameters:

where are basis functions. This can be written in matrix form as:

where:

  • is the vector of observations
  • is the design matrix with elements
  • is the error vector

The objective function becomes:

Normal Equations

The optimal solution to the linear least squares problem satisfies the normal equations:

If is invertible, the solution is:

This is the closed-form solution to the linear least squares problem.

Numerical Methods for Linear Least Squares

Several numerical methods can be used to solve linear least squares problems, especially for large-scale or ill-conditioned problems:

  1. QR Decomposition:

    • Decompose where is orthogonal and is upper triangular
    • Solution:
    • More stable than direct solution of normal equations
  2. Singular Value Decomposition (SVD):

    • Decompose
    • Solution: where is the pseudoinverse of
    • Most stable but computationally expensive
    • Handles rank-deficient problems
  3. Cholesky Decomposition:

    • Decompose where is lower triangular
    • Solution via forward and backward substitution
    • Efficient for positive definite normal equations
  4. Iterative Methods:

    • Conjugate Gradient (CG)
    • LSQR (for sparse problems)
    • Suitable for very large-scale problems

Nonlinear Least Squares

Formulation

When the model function is nonlinear in the parameters, we have a nonlinear least squares problem:

where and is nonlinear in .

Gauss-Newton Method

The Gauss-Newton method is a specialized algorithm for nonlinear least squares problems:

  1. Start with an initial guess
  2. At each iteration :
    • Compute the Jacobian matrix with elements
    • Solve the linear system:
    • Update:
  3. Repeat until convergence

This can be viewed as applying Newton’s method to the normal equations, but approximating the Hessian as .

Levenberg-Marquardt Algorithm

The Levenberg-Marquardt algorithm adds damping to the Gauss-Newton method to improve stability:

where is a damping parameter that adjusts dynamically:

  • If the step reduces the error, decrease (more like Gauss-Newton)
  • If the step increases the error, increase (more like gradient descent)

This can be viewed as a trust region method where the damping parameter controls the size of the trust region.

Applications of Least Squares

Regression Analysis

Least squares is the foundation of regression analysis, used to model relationships between variables:

  • Linear regression
  • Polynomial regression
  • Multiple regression
  • Generalized linear models

Curve Fitting

Fitting mathematical functions to data points:

  • Interpolation
  • Approximation of complex functions
  • Smoothing noisy data

Parameter Estimation

Estimating parameters in physical models:

  • System identification in control theory
  • Estimating chemical reaction rates
  • Determining material properties

Signal Processing

Processing and analyzing signals:

  • Filter design
  • System identification
  • Spectrum analysis

Geodesy and Surveying

Determining positions and distances:

  • GPS position estimation
  • Surveying measurements
  • Geodetic network adjustments

Statistical Interpretation

Least squares estimation has important statistical interpretations:

  1. Maximum Likelihood Estimation: Under the assumption of independent, identically distributed Gaussian errors, the least squares estimate is equivalent to the maximum likelihood estimate.

  2. Best Linear Unbiased Estimator (BLUE): Under the Gauss-Markov assumptions, the least squares estimator is the best linear unbiased estimator.

  3. Minimum Variance Unbiased Estimator: If the errors are normally distributed, the least squares estimator is the minimum variance unbiased estimator.

Error Analysis and Diagnostics

Several metrics help evaluate the quality of least squares fits:

  1. Residual Analysis:

    • Plot residuals to check for patterns
    • Test for normality of residuals
    • Check for autocorrelation
  2. Goodness of Fit Measures:

    • Coefficient of determination ()
    • Adjusted
    • Root mean squared error (RMSE)
  3. Leverage and Influence:

    • Hat matrix diagonal elements (leverage)
    • Cook’s distance (influence)
    • DFFITS and DFBETAS

Implementation Considerations

Numerical Stability

Linear least squares problems can become ill-conditioned when:

  • The columns of are nearly linearly dependent
  • There are large differences in scale among variables
  • The number of parameters is large compared to the number of data points

Solutions include:

  • Regularization
  • Using stable decomposition methods (SVD)
  • Data preprocessing (centering, scaling)

Computational Efficiency

For large-scale problems, considerations include:

  • Memory requirements for storing matrices
  • Computational complexity of matrix operations
  • Specialized algorithms for sparse systems
  • Iterative methods for very large systems