What Is a Distributed System?
A distributed system can be defined in several ways:
-
Tanenbaum and van Steen: “A collection of independent computers that appears to its users as a single coherent system”
-
Coulouris, Dollimore and Kindberg: “One in which hardware or software components located at networked computers communicate and coordinate their actions only by passing messages”
-
Lamport: “One that stops you getting work done when a machine you’ve never even heard of crashes”
Motivations for Distributed Systems
- Geographic Distribution: Resources and users are naturally distributed
- Example: Banking services accessible from different locations while data is centrally stored
- Fault Tolerance: Problems rarely affect multiple locations simultaneously
- Multiple database servers in different rooms provide better reliability
- Performance and Scalability: Combining resources for enhanced capabilities
- High Performance Computing, replicated web servers, etc.
Examples of Distributed Systems
- Financial trading platforms
- Web search engines (processing 50+ billion web pages)
- Social media platforms supporting billions of users
- Large Language Models (trained across clusters)
- Scientific research (e.g., CERN with over 1 Exabyte of data)
- Content Delivery Networks (CDNs)
- Online multiplayer games
Fallacies of Distributed Computing
Eight classic assumptions that often lead to problematic distributed systems designs (identified at Sun Microsystems):
- The network is reliable
- Latency is zero
- Bandwidth is infinite
- The network is secure
- There is one administrator
- Transport cost is zero
- The network is homogeneous
- Topology doesn’t change
Key Aspects of Distributed System Design
- System Function: The intended purpose (features and capabilities)
- System Behavior: How the system performs its functions
- Quality Attributes: Core qualities determining success:
- Performance
- Cost
- Security
- Dependability
Challenges in Distributed Systems
Distributed systems introduce complexity in:
- Coordination
- Consistency
- Fault detection and recovery
- Security
- Performance optimization