Least squares is a mathematical optimization technique that seeks to find the best fit to a set of data by minimizing the sum of the squared differences between observed values and predicted values. It is one of the most widely used approaches in data fitting and parameter estimation.
Mathematical Formulation
Basic Formulation
Given a set of observations for and a model function parameterized by a vector , the least squares problem is:
This can be rewritten in vector form by defining the residual vector with components :
Weighted Least Squares
When observations have different uncertainties, a weighted least squares approach is used:
where are weights, typically chosen as where is the standard deviation of the -th observation.
Regularized Least Squares
To prevent overfitting or handle ill-posed problems, regularization terms can be added:
This is known as Tikhonov regularization or ridge regression when using the norm of .
Linear Least Squares
Formulation
In linear least squares, the model function is a linear combination of parameters:
where are basis functions. This can be written in matrix form as:
where:
- is the vector of observations
- is the design matrix with elements
- is the error vector
The objective function becomes:
Normal Equations
The optimal solution to the linear least squares problem satisfies the normal equations:
If is invertible, the solution is:
This is the closed-form solution to the linear least squares problem.
Numerical Methods for Linear Least Squares
Several numerical methods can be used to solve linear least squares problems, especially for large-scale or ill-conditioned problems:
-
QR Decomposition:
- Decompose where is orthogonal and is upper triangular
- Solution:
- More stable than direct solution of normal equations
-
Singular Value Decomposition (SVD):
- Decompose
- Solution: where is the pseudoinverse of
- Most stable but computationally expensive
- Handles rank-deficient problems
-
Cholesky Decomposition:
- Decompose where is lower triangular
- Solution via forward and backward substitution
- Efficient for positive definite normal equations
-
Iterative Methods:
- Conjugate Gradient (CG)
- LSQR (for sparse problems)
- Suitable for very large-scale problems
Nonlinear Least Squares
Formulation
When the model function is nonlinear in the parameters, we have a nonlinear least squares problem:
where and is nonlinear in .
Gauss-Newton Method
The Gauss-Newton method is a specialized algorithm for nonlinear least squares problems:
- Start with an initial guess
- At each iteration :
- Compute the Jacobian matrix with elements
- Solve the linear system:
- Update:
- Repeat until convergence
This can be viewed as applying Newton’s method to the normal equations, but approximating the Hessian as .
Levenberg-Marquardt Algorithm
The Levenberg-Marquardt algorithm adds damping to the Gauss-Newton method to improve stability:
where is a damping parameter that adjusts dynamically:
- If the step reduces the error, decrease (more like Gauss-Newton)
- If the step increases the error, increase (more like gradient descent)
This can be viewed as a trust region method where the damping parameter controls the size of the trust region.
Applications of Least Squares
Regression Analysis
Least squares is the foundation of regression analysis, used to model relationships between variables:
- Linear regression
- Polynomial regression
- Multiple regression
- Generalized linear models
Curve Fitting
Fitting mathematical functions to data points:
- Interpolation
- Approximation of complex functions
- Smoothing noisy data
Parameter Estimation
Estimating parameters in physical models:
- System identification in control theory
- Estimating chemical reaction rates
- Determining material properties
Signal Processing
Processing and analyzing signals:
- Filter design
- System identification
- Spectrum analysis
Geodesy and Surveying
Determining positions and distances:
- GPS position estimation
- Surveying measurements
- Geodetic network adjustments
Statistical Interpretation
Least squares estimation has important statistical interpretations:
-
Maximum Likelihood Estimation: Under the assumption of independent, identically distributed Gaussian errors, the least squares estimate is equivalent to the maximum likelihood estimate.
-
Best Linear Unbiased Estimator (BLUE): Under the Gauss-Markov assumptions, the least squares estimator is the best linear unbiased estimator.
-
Minimum Variance Unbiased Estimator: If the errors are normally distributed, the least squares estimator is the minimum variance unbiased estimator.
Error Analysis and Diagnostics
Several metrics help evaluate the quality of least squares fits:
-
Residual Analysis:
- Plot residuals to check for patterns
- Test for normality of residuals
- Check for autocorrelation
-
Goodness of Fit Measures:
- Coefficient of determination ()
- Adjusted
- Root mean squared error (RMSE)
-
Leverage and Influence:
- Hat matrix diagonal elements (leverage)
- Cook’s distance (influence)
- DFFITS and DFBETAS
Implementation Considerations
Numerical Stability
Linear least squares problems can become ill-conditioned when:
- The columns of are nearly linearly dependent
- There are large differences in scale among variables
- The number of parameters is large compared to the number of data points
Solutions include:
- Regularization
- Using stable decomposition methods (SVD)
- Data preprocessing (centering, scaling)
Computational Efficiency
For large-scale problems, considerations include:
- Memory requirements for storing matrices
- Computational complexity of matrix operations
- Specialized algorithms for sparse systems
- Iterative methods for very large systems