Understanding Failures, Errors, and Faults
The Fault-Error-Failure Chain
- Fault: Hypothesized cause of an error
- A defect in the system (e.g., bug in code, hardware defect)
- Not all faults lead to errors
- Error: Deviation from correct system state
- Manifestation of a fault
- May exist without causing a failure
- Examples: erroneous data, inconsistent internal behavior
- Failure: System service deviating from specification
- Visible at the service interface
- Caused by errors propagating to the service interface
- Examples: crash, incorrect output, timing violation
Fault Classification
Faults can be classified along multiple dimensions:

Phase of Creation or Occurrence
- Development Faults: Introduced during system development
- Operational Faults: Occurring during system operation
System Boundaries
- Internal Faults: Originating from within the system
- External Faults: Originating from outside the system
Phenomenological Cause
- Natural Faults: Caused by natural phenomena
- Human-made Faults: Resulting from human actions
Intent
- Non-malicious Faults: Without harmful intent
- Malicious Faults: With harmful intent (attacks)
Capability/Competence
- Accidental Faults: Introduced inadvertently
- Incompetence Faults: Due to lack of skills/knowledge
Persistence
- Permanent Faults: Persisting until repaired
- Transient Faults: Appearing then disappearing
Failure Spectrum
Failure isn’t binary but exists on a spectrum:
- Optimal Service: Meeting functional requirements and balancing all quality attributes
- Partial Failure: Some parts of the system fail while others continue
- Degraded Service: System functions but with reduced performance
- Transient Failure: Temporary interruption with automatic recovery
- Complete Failure: System becomes unresponsive or produces incorrect results
Dependability Attributes
Dependability Tree

-
Attributes
- Availability: Readiness for correct service
- Reliability: Continuity of correct service
- Safety: Freedom from catastrophic consequences
- Confidentiality: Absence of unauthorized disclosure
- Integrity: Absence of improper system alterations
- Maintainability: Ability to undergo repair and evolution
-
Threats
- Faults
- Errors
- Failures
-
Means
- Fault Prevention
- Fault Tolerance
- Fault Removal
- Fault Forecasting
Availability and Reliability
Distinction
- Availability: System readiness for service when needed
- Measured as percentage of uptime
- Focused on accessibility
- Reliability: System’s ability to function without failure over time
- Measured as Mean Time Between Failures (MTBF)
- Focused on continuity
Examples
- System with 99.99% availability but produces incorrect results occasionally: High availability, low reliability
- System that never crashes but shuts down for maintenance one week each year: High reliability, lower availability (98%)