Lab 6: Fault Tolerance
Objectives
- Understand and implement retry mechanisms and fallback strategies
- Improve application dependability in a cloud environment
- Learn to handle temporary failures, prolonged outages, and degraded modes
Key Components and Tasks
1. Flaky Service Implementation
Creation of a simple web service with a configurable failure rate:
from flask import Flask, jsonify
import random
import os
app = Flask(__name__)
# Environment variable to control service behavior: "normal", "fallback", "failure"
SERVICE_MODE = os.environ.get("SERVICE_MODE", "normal")
@app.route('/')
def flaky_endpoint():
if SERVICE_MODE == "failure":
return jsonify({"message": "Service Unavailable"}), 503
elif SERVICE_MODE == "fallback":
return jsonify({"message": "Service in degraded mode (fallback)", "data": [1, 2, 3]}), 200
elif SERVICE_MODE == "normal":
if random.random() < 0.3: # Simulate a 30% failure rate
return jsonify({"message": "Service Unavailable"}), 503
else:
return jsonify({"message": "Hello from the service!", "data": [1, 2, 3, 4, 5]}), 200
if __name__ == '__main__':
app.run(debug=True, host='0.0.0.0', port=5055)2. Client with Retry Mechanism
Implementing a simple retry mechanism for handling temporary failures:
import requests
import time
def make_request_with_retry(url, max_retries=3, retry_delay=1):
for attempt in range(max_retries + 1):
try:
response = requests.get(url)
response.raise_for_status() # HTTPError (4xx/5xx) for bad responses
return response.json()
except requests.exceptions.RequestException as e:
print(f"Attempt {attempt + 1} failed: {e}")
if attempt < max_retries:
print(f"Retrying in {retry_delay} seconds...")
time.sleep(retry_delay)
else:
return {"message": "Service unavailable (fallback)"}3. Advanced Client with Fallback Strategy
- Implementing graceful degradation when services fail
- Detecting service in fallback mode and responding accordingly
- Using cached/limited data as a fallback mechanism
4. Circuit Breaker Pattern (Optional)
- Implementing a circuit breaker to prevent overloading failing services
- Managing circuit states: CLOSED, OPEN, HALF-OPEN
- Implementing dynamic recovery behavior
5. Testing and Observation
Testing the client-service interaction under different scenarios:
- Normal operation with occasional failures
- Complete service failure
- Service in degraded (fallback) mode
- Circuit breaker operation and recovery
Key Concepts Learned
- Resilient service design
- Fault tolerance patterns
- Graceful degradation
- Circuit breaker pattern
- Service health monitoring
Lab 7: Load Balancing
Objectives
- Understand the principles of load balancing
- Configure and test load balancing with microservices
- Set up Nginx as a reverse proxy and load balancer
Key Components and Tasks
1. Simple Service Implementation
Creating a service that identifies itself:
from flask import Flask
import os
app = Flask(__name__)
@app.route('/')
def hello():
if "service1" in os.environ.get("SERVER_NAME",""):
return "Hello from Service 1"
else:
return "Hello from Service 2"
if __name__ == '__main__':
app.run(debug=True, host='0.0.0.0', port=5055)2. Docker Network and Service Deployment
- Creating a Docker network for container communication
- Running multiple instances of the service with different identities
- Exposing services on different host ports
docker network create my-network
docker build -t hello-service .
docker run -d -p 5056:5055 --name service1 -e SERVER_NAME="service1" --network my-network --network-alias service1 hello-service
docker run -d -p 5057:5055 --name service2 -e SERVER_NAME="service2" --network my-network --network-alias service2 hello-service3. Nginx Load Balancer Configuration
Setting up Nginx as a reverse proxy and load balancer:
events {
worker_connections 1024;
}
http {
upstream backend {
# round-robin load balancing
server service1:5055;
server service2:5055;
# weighted load balancing
# server service1:5055 weight=3;
# server service2:5055 weight=1;
}
server {
listen 80;
location / {
proxy_pass http://backend;
}
}
}4. Testing and Observation
- Testing direct access to each service
- Testing access through the load balancer
- Observing round-robin load balancing behavior
- Testing service resilience by stopping one service
- Testing weighted load balancing
Key Concepts Learned
- Load balancing techniques
- Reverse proxy configuration
- Docker networking
- Service discovery
- High availability through redundancy
- Load balancing algorithms (round-robin, weighted)
Common Lab Techniques
Docker and Containerization
- Dockerfile creation and best practices
- Container networking
- Environment variable configuration
- Container orchestration
API Design and Implementation
- RESTful API principles
- Flask for lightweight web services
- JSON for data exchange
- Status codes for error handling
Resilience Patterns
- Retry mechanisms
- Fallback strategies
- Circuit breakers
- Load balancing
- Service discovery
Testing and Debugging
- API testing with curl and browsers
- Debugging distributed systems
- Log analysis
- Service monitoring
Related Topics
- Container Fundamentals
- High Availability
- Fault Tolerance
- Load Balancing
- Microservices Architecture
- Docker
References:
- COMPSCI4106/5118 Cloud Systems Lab Materials