Cloud Systems Labs

Lab 6: Fault Tolerance

Objectives

Understand and implement retry mechanisms and fallback strategies
Improve application dependability in a cloud environment
Learn to handle temporary failures, prolonged outages, and degraded modes

Key Components and Tasks

1. Flaky Service Implementation

Creation of a simple web service with a configurable failure rate:

from flask import Flask, jsonify
import random
import os
 
app = Flask(__name__)
 
# Environment variable to control service behavior: "normal", "fallback", "failure"
SERVICE_MODE = os.environ.get("SERVICE_MODE", "normal")
 
@app.route('/')
def flaky_endpoint():
    if SERVICE_MODE == "failure":
        return jsonify({"message": "Service Unavailable"}), 503
    elif SERVICE_MODE == "fallback":
        return jsonify({"message": "Service in degraded mode (fallback)", "data": [1, 2, 3]}), 200
    elif SERVICE_MODE == "normal":
        if random.random() < 0.3:  # Simulate a 30% failure rate
            return jsonify({"message": "Service Unavailable"}), 503
        else:
            return jsonify({"message": "Hello from the service!", "data": [1, 2, 3, 4, 5]}), 200
 
if __name__ == '__main__':
    app.run(debug=True, host='0.0.0.0', port=5055)

2. Client with Retry Mechanism

Implementing a simple retry mechanism for handling temporary failures:

import requests
import time
 
def make_request_with_retry(url, max_retries=3, retry_delay=1):
    for attempt in range(max_retries + 1):
        try:
            response = requests.get(url)
            response.raise_for_status()  # HTTPError (4xx/5xx) for bad responses
            return response.json()
        except requests.exceptions.RequestException as e:
            print(f"Attempt {attempt + 1} failed: {e}")
            if attempt < max_retries:
                print(f"Retrying in {retry_delay} seconds...")
                time.sleep(retry_delay)
            else:
                return {"message": "Service unavailable (fallback)"}

3. Advanced Client with Fallback Strategy

Implementing graceful degradation when services fail
Detecting service in fallback mode and responding accordingly
Using cached/limited data as a fallback mechanism

4. Circuit Breaker Pattern (Optional)

Implementing a circuit breaker to prevent overloading failing services
Managing circuit states: CLOSED, OPEN, HALF-OPEN
Implementing dynamic recovery behavior

5. Testing and Observation

Testing the client-service interaction under different scenarios:

Normal operation with occasional failures
Complete service failure
Service in degraded (fallback) mode
Circuit breaker operation and recovery

Key Concepts Learned

Resilient service design
Fault tolerance patterns
Graceful degradation
Circuit breaker pattern
Service health monitoring

Lab 7: Load Balancing

Objectives

Understand the principles of load balancing
Configure and test load balancing with microservices
Set up Nginx as a reverse proxy and load balancer

Key Components and Tasks

1. Simple Service Implementation

Creating a service that identifies itself:

from flask import Flask
import os
 
app = Flask(__name__)
 
@app.route('/')
def hello():
    if "service1" in os.environ.get("SERVER_NAME",""):
        return "Hello from Service 1"
    else:
        return "Hello from Service 2"
 
if __name__ == '__main__':
    app.run(debug=True, host='0.0.0.0', port=5055)

2. Docker Network and Service Deployment

Creating a Docker network for container communication
Running multiple instances of the service with different identities
Exposing services on different host ports

docker network create my-network
docker build -t hello-service .
docker run -d -p 5056:5055 --name service1 -e SERVER_NAME="service1" --network my-network --network-alias service1 hello-service
docker run -d -p 5057:5055 --name service2 -e SERVER_NAME="service2" --network my-network --network-alias service2 hello-service

3. Nginx Load Balancer Configuration

Setting up Nginx as a reverse proxy and load balancer:

events {
    worker_connections 1024;
}
 
http {
    upstream backend {
        # round-robin load balancing
        server service1:5055;
        server service2:5055;
        
        # weighted load balancing
        # server service1:5055 weight=3;
        # server service2:5055 weight=1;
    }
    
    server {
        listen 80;
        location / {
            proxy_pass http://backend;
        }
    }
}

4. Testing and Observation

Testing direct access to each service
Testing access through the load balancer
Observing round-robin load balancing behavior
Testing service resilience by stopping one service
Testing weighted load balancing

Key Concepts Learned

Load balancing techniques
Reverse proxy configuration
Docker networking
Service discovery
High availability through redundancy
Load balancing algorithms (round-robin, weighted)

Common Lab Techniques

Docker and Containerization

Dockerfile creation and best practices
Container networking
Environment variable configuration
Container orchestration

API Design and Implementation

RESTful API principles
Flask for lightweight web services
JSON for data exchange
Status codes for error handling

Resilience Patterns

Retry mechanisms
Fallback strategies
Circuit breakers
Load balancing
Service discovery

Testing and Debugging

API testing with curl and browsers
Debugging distributed systems
Log analysis
Service monitoring

References:

COMPSCI4106/5118 Cloud Systems Lab Materials

Quartz 4

Explorer

Cloud Systems Labs

Lab 6: Fault Tolerance

Objectives

Key Components and Tasks

1. Flaky Service Implementation

2. Client with Retry Mechanism

3. Advanced Client with Fallback Strategy

4. Circuit Breaker Pattern (Optional)

5. Testing and Observation

Key Concepts Learned

Lab 7: Load Balancing

Objectives

Key Components and Tasks

1. Simple Service Implementation

2. Docker Network and Service Deployment

3. Nginx Load Balancer Configuration

4. Testing and Observation

Key Concepts Learned

Common Lab Techniques

Docker and Containerization

API Design and Implementation

Resilience Patterns

Testing and Debugging

Graph View

Table of Contents

Backlinks

Quartz 4

Explorer

Cloud Systems Labs

Lab 6: Fault Tolerance

Objectives

Key Components and Tasks

1. Flaky Service Implementation

2. Client with Retry Mechanism

3. Advanced Client with Fallback Strategy

4. Circuit Breaker Pattern (Optional)

5. Testing and Observation

Key Concepts Learned

Lab 7: Load Balancing

Objectives

Key Components and Tasks

1. Simple Service Implementation

2. Docker Network and Service Deployment

3. Nginx Load Balancer Configuration

4. Testing and Observation

Key Concepts Learned

Common Lab Techniques

Docker and Containerization

API Design and Implementation

Resilience Patterns

Testing and Debugging

Related Topics

Graph View

Table of Contents

Backlinks