Model Fitting and Regression

Model fitting is the process of identifying mathematical or computational models that best explain observed data. Regression, a common type of model fitting, focuses specifically on estimating relationships between variables. These techniques form the foundation of data-driven engineering and scientific discovery.

Fundamental Concepts

Model Components

A typical model fitting process involves:

Observed Data: A set of measurements or observations $(x_{i}, y_{i})$ for $i = 1, 2, ..., N$
Model Function: A mathematical relationship $y = f (x; θ)$ parameterized by $θ$
Error Measure: A quantification of the discrepancy between observations and model predictions
Optimization Problem: Finding the parameters $θ$ that minimize the error measure

Types of Models

Models can be classified based on their structure:

Linear Models: $f (x; θ) = θ_{1} ϕ_{1} (x) + θ_{2} ϕ_{2} (x) + ... + θ_{n} ϕ_{n} (x)$
- Linear in parameters $θ$ , not necessarily linear in $x$
- Examples: linear regression, polynomial regression, basis function models
Nonlinear Models: $f (x; θ)$ is nonlinear in parameters $θ$
- Examples: exponential models, logistic models, power laws, neural networks
Parametric vs. Nonparametric Models:
- Parametric: Fixed form with finite number of parameters
- Nonparametric: Flexible form, potentially infinite parameters (e.g., kernel methods)

Mathematical Formulation

General Problem Statement

Given data points $(x_{i}, y_{i})$ for $i = 1, 2, ..., N$ , find the parameter vector $θ$ that minimizes:

$E (θ) = \sum_{i = 1}^{N} e (y_{i}, f (x_{i}; θ))$

where $e (y, \overset{y}{^})$ is an error function measuring the discrepancy between observed $y$ and predicted $\overset{y}{^}$ .

Common Error Functions

Squared Error: $e (y, \overset{y}{^}) = (y - \overset{y}{^})^{2}$
- Leads to least squares regression
- Sensitive to outliers
- Optimal for Gaussian noise
Absolute Error: $e (y, \overset{y}{^}) = ∣ y - \overset{y}{^} ∣$
- Leads to least absolute deviations (LAD) regression
- More robust to outliers
- Optimal for Laplace distributed noise
Huber Loss: A hybrid approach
- $e (y, \overset{y}{^}) = \frac{1}{2} (y - \overset{y}{^})^{2}$ for $∣ y - \overset{y}{^} ∣ \leq δ$
- $e (y, \overset{y}{^}) = δ ∣ y - \overset{y}{^} ∣ - \frac{1}{2} δ^{2}$ for $∣ y - \overset{y}{^} ∣ > δ$
- Combines robustness of absolute error with smoothness of squared error

Linear Regression

Simple Linear Regression

The simplest form of regression models the relationship between two variables with a straight line:

$y = θ_{0} + θ_{1} x + ϵ$

where $ϵ$ represents random error.

Multiple Linear Regression

Extends simple linear regression to multiple independent variables:

$y = θ_{0} + θ_{1} x_{1} + θ_{2} x_{2} + ... + θ_{p} x_{p} + ϵ$

Matrix Formulation

The multiple linear regression can be expressed in matrix notation:

$y = X θ + ϵ$

where:

$y \in R^{N}$ is the vector of observed dependent variables
$X \in R^{N \times (p + 1)}$ is the design matrix containing the independent variables
$θ \in R^{p + 1}$ is the parameter vector
$ϵ \in R^{N}$ is the error vector

Least Squares Solution

Under the least squares criterion, the optimal parameter vector is:

$θ^{*} = (X^{T} X)^{- 1} X^{T} y$

This closed-form solution exists when $X^{T} X$ is invertible.

Polynomial Regression

A specific case of linear regression where basis functions are powers of $x$ :

$y = θ_{0} + θ_{1} x + θ_{2} x^{2} + ... + θ_{p} x^{p} + ϵ$

This is linear in parameters $θ$ despite modeling nonlinear relationships in $x$ .

Nonlinear Regression

Formulation

Nonlinear regression involves models where the parameters appear nonlinearly:

$y = f (x; θ) + ϵ$

Common examples include:

Exponential models: $f (x; θ) = θ_{1} e^{θ_{2} x}$
Growth models: $f (x; θ) = \frac{θ _{1} x}{θ _{2} + x}$
Sinusoidal models: $f (x; θ) = θ_{1} sin (θ_{2} x + θ_{3})$

Optimization Approaches

Unlike linear regression, nonlinear regression typically requires iterative numerical optimization:

Gauss-Newton Method: Specialized for least squares problems
Levenberg-Marquardt Algorithm: Robust modification of Gauss-Newton
Gradient Descent: Simple but potentially slow convergence
Trust Region Methods: Restrict optimization steps to regions where the model is trusted

Regularization Techniques

Regularization addresses overfitting by adding penalties to the objective function:

Ridge Regression (L2 Regularization)

$E (θ) = \sum_{i = 1}^{N} (y_{i} - f (x_{i}; θ))^{2} + λ \sum_{j = 1}^{p} θ_{j}^{2}$

Shrinks parameters toward zero
Handles multicollinearity well
Closed-form solution: $θ^{*} = (X^{T} X + λ I)^{- 1} X^{T} y$

Lasso Regression (L1 Regularization)

$E (θ) = \sum_{i = 1}^{N} (y_{i} - f (x_{i}; θ))^{2} + λ \sum_{j = 1}^{p} ∣ θ_{j} ∣$

Produces sparse solutions (feature selection)
No closed-form solution, requires quadratic programming
Effective for high-dimensional data

Elastic Net

Combines L1 and L2 regularization:

$E (θ) = \sum_{i = 1}^{N} (y_{i} - f (x_{i}; θ))^{2} + λ_{1} \sum_{j = 1}^{p} ∣ θ_{j} ∣ + λ_{2} \sum_{j = 1}^{p} θ_{j}^{2}$

Model Evaluation and Selection

Performance Metrics

Mean Squared Error (MSE): $\frac{1}{N} \sum_{i = 1}^{N} (y_{i} - f (x_{i}; θ))^{2}$
Root Mean Squared Error (RMSE): $MSE$
Mean Absolute Error (MAE): $\frac{1}{N} \sum_{i = 1}^{N} ∣ y_{i} - f (x_{i}; θ) ∣$
Coefficient of Determination (R²): $1 - \frac{\sum _{i} ( y _{i} - f ( x _{i} ; θ ) ) ^{2}}{\sum _{i} ( y _{i} - y ˉ ) ^{2}}$
- Represents the proportion of variance explained by the model
- Ranges from 0 to 1 for linear models (can be negative for nonlinear models)
- Higher values indicate better fit

Cross-Validation

Techniques to assess model performance on unseen data:

k-fold Cross-Validation: Divides data into k subsets, trains on k-1 subsets and tests on the remaining one, rotating through all subsets
Leave-One-Out Cross-Validation: Special case of k-fold where k equals the number of data points
Train-Test Split: Divides data into training and testing sets (typically 70-30% or 80-20%)
Hold-out Validation: Sets aside a portion of data for final evaluation

Model Selection Criteria

Metrics that balance goodness-of-fit with model complexity:

Akaike Information Criterion (AIC): $AIC = 2 k - 2 ln (L)$
- $k$ is the number of parameters
- $L$ is the likelihood function value
Bayesian Information Criterion (BIC): $BIC = k ln (n) - 2 ln (L)$
- Penalizes complexity more severely than AIC
Adjusted R²: $Adjusted R^{2} = 1 - \frac{( 1 - R ^{2} ) ( N - 1 )}{N - p - 1}$
- Adjusts R² based on the number of predictors

Parameter Uncertainty and Confidence

Parameter Covariance Matrix

For least squares estimation, the parameter covariance matrix is:

$Cov (θ) = σ^{2} (X^{T} X)^{- 1}$

where $σ^{2}$ is the variance of the error term.

Confidence Intervals

For parameter $θ_{j}$ , the 95% confidence interval is:

$θ_{j} \pm t_{N - p - 1, 0.975} \cdot SE (θ_{j})$

where $SE (θ_{j}) = Cov (θ)_{jj}$ and $t_{N - p - 1, 0.975}$ is the t-statistic with $N - p - 1$ degrees of freedom.

Prediction Intervals

For a new observation at $x_{0}$ , the prediction interval is:

$f (x_{0}; θ) \pm t_{N - p - 1, 0.975} \cdot σ 1 + x_{0}^{T} (X^{T} X)^{- 1} x_{0}$

Specialized Regression Techniques

Weighted Least Squares

Incorporates varying reliability of observations:

$E (θ) = \sum_{i = 1}^{N} w_{i} (y_{i} - f (x_{i}; θ))^{2}$

where $w_{i}$ are weights, often inversely proportional to the variance of observations.

Robust Regression

Methods less sensitive to outliers:

M-Estimation: Minimizes a robust loss function
MM-Estimation: Combines high breakdown point with efficiency
Least Trimmed Squares: Minimizes the sum of the smallest residuals

Bayesian Regression

Incorporates prior knowledge about parameters:

$p (θ ∣ d a t a) \propto p (d a t a ∣ θ) \cdot p (θ)$

Prior: $p (θ)$ represents knowledge before seeing data
Likelihood: $p (d a t a ∣ θ)$ represents how well parameters explain data
Posterior: $p (θ ∣ d a t a)$ represents updated knowledge after seeing data

Implementation with Optimization Algorithms

Model fitting is fundamentally an optimization problem. Common approaches include:

Direct Methods: For linear models with closed-form solutions
Iterative Methods: Required for nonlinear models
- Gauss-Newton: Efficient for moderate nonlinearity
- Levenberg-Marquardt: More robust, combines Gauss-Newton with steepest descent
- Trust Region Methods: Controls step size based on model reliability
- Stochastic Gradient Descent: Effective for large datasets

Applications in Engineering

System Identification: Determining mathematical models of dynamic systems
Empirical Modeling: Creating models based on experimental data
Parameter Estimation: Determining physical parameters from measurements
Response Surface Methodology: Optimizing processes using experimental designs
Calibration: Adjusting simulation models to match observed behavior

Quartz 4

Explorer

Model Fitting and Regression

Fundamental Concepts

Model Components

Types of Models

Mathematical Formulation

General Problem Statement

Common Error Functions

Linear Regression

Simple Linear Regression

Multiple Linear Regression

Matrix Formulation

Least Squares Solution

Polynomial Regression

Nonlinear Regression

Formulation

Optimization Approaches

Regularization Techniques

Ridge Regression (L2 Regularization)

Lasso Regression (L1 Regularization)

Elastic Net

Model Evaluation and Selection

Performance Metrics

Cross-Validation

Model Selection Criteria

Parameter Uncertainty and Confidence

Parameter Covariance Matrix

Confidence Intervals

Prediction Intervals

Specialized Regression Techniques

Weighted Least Squares

Robust Regression

Bayesian Regression

Implementation with Optimization Algorithms

Applications in Engineering

Graph View

Table of Contents

Backlinks

Quartz 4

Explorer

Model Fitting and Regression

Fundamental Concepts

Model Components

Types of Models

Mathematical Formulation

General Problem Statement

Common Error Functions

Linear Regression

Simple Linear Regression

Multiple Linear Regression

Matrix Formulation

Least Squares Solution

Polynomial Regression

Nonlinear Regression

Formulation

Optimization Approaches

Regularization Techniques

Ridge Regression (L2 Regularization)

Lasso Regression (L1 Regularization)

Elastic Net

Model Evaluation and Selection

Performance Metrics

Cross-Validation

Model Selection Criteria

Parameter Uncertainty and Confidence

Parameter Covariance Matrix

Confidence Intervals

Prediction Intervals

Specialized Regression Techniques

Weighted Least Squares

Robust Regression

Bayesian Regression

Implementation with Optimization Algorithms

Applications in Engineering

Related Pages

Graph View

Table of Contents

Backlinks