Modern Cloud Architectures

Modern cloud architectures are built on several key concepts that address the challenges of building large-scale, distributed, and reliable systems. This note provides an overview of the architectural approaches used in modern cloud systems.

Architectural Foundations

Modern cloud architectures are founded on two fundamental pillars:

Vertical integration - Enhancing capabilities within individual tiers/services
Horizontal scaling - Using multiple commodity computers working together

These pillars have led to significant shifts away from monolithic application architectures toward more distributed approaches.

Architectural Concepts

Layering

Definition: Partitioning services vertically into layers
- Lower layers provide services to higher ones
- Higher layers unaware of underlying implementation details
- Low inter-layer dependency
Examples:
- Network protocol stacks (OSI model)
- Operating systems (kernel, drivers, libraries, GUI)
- Games (engine, logic, AI, UI)
Advantages:
- Abstraction
- Reusability
- Loose coupling
- Isolated management and testing
- Supports software evolution

Tiering

Definition: Mapping the organization of and within a layer to physical or virtual devices
- Implies physical location considerations
- Complements layering
Classic Architectures:
1. 2-tier (client-server): Split layers between client and server
2. 3-tier: User Interface, Application Logic, Data tiers
3. n-tier/multi-tier: Further division (e.g., microservices)
Advantages:
- Scalability
- Availability
- Flexibility
- Easier management

Monolith vs. Distributed Architecture

Monolithic Architecture

Definition: A single, tightly coupled block of code with all application components
Advantages:
- Simple to develop and deploy
- Easy to test and debug in early stages
Disadvantages:
- Increasing complexity as application grows
- Difficult to scale individual components
- Limited agility with slow and risky deployments
- Technology lock-in

Distributed Architecture

Definition: Application divided into loosely coupled components running on separate servers
Advantages:
- Independent scaling of components
- Fault isolation
- Technology diversity
- Better maintainability
Disadvantages:
- Network communication overhead
- More complex to manage
- Distributed debugging challenges

Practical Application Guidelines

When designing cloud architectures:

Foundation matters: Just as buildings need proper foundations, cloud architectures require robust infrastructure layers
Consider scalability & modularity: Employ modular techniques for easier expansion and modification
Focus on resource efficiency: Implement auto-scaling, serverless approaches, and efficient resource allocation
Plan for evolution: Design systems that can adapt to new technologies while maintaining stability

Modern Cloud Architectures - Redundancy
Redundancy is a key design principle in modern cloud architectures that improves fault tolerance, availability, and performance.

Why Use Redundancy?

Performance: Distribute workload across multiple replicas to improve response time

Error Detection: Compare results when replicas disagree

Error Recovery: Switch to backup resources when primary fails

Fault Tolerance: System continues functioning despite component failures

Importance of Fault Models

The effectiveness of redundancy depends on how individual replicas fail:
For independent crash faults, the availability of a system with n replicas is:
Availability = 1-p^n
Where p is the probability of individual failure
Example: 5 servers each with 90% uptime → overall availability = 1-(0.10)^5 = 99.999%
This only holds if failures are truly independent, which requires consideration of common failure modes.

Redundancy by Replication

Replication involves maintaining multiple copies of:

Data

Services

Infrastructure components

Data Replication

Synchronous Replication: Write operations complete only after all replicas are updated

Ensures consistency but increases latency

Used for critical data where consistency is paramount

Asynchronous Replication: Primary replica acknowledges writes before secondaries are updated

Better performance but may lose data if primary fails before replication

Used when performance is prioritized over consistency

Quorum-based Replication: Write operations complete when a majority of replicas acknowledge

Balances availability and consistency

Service Replication

Active-Passive Replication:

One active instance handles all requests

Passive instances ready to take over if active fails

Lower resource utilization but potential downtime during failover

Active-Active Replication:

Multiple active instances handle requests simultaneously

No downtime during instance failure

Requires more complex state management

Infrastructure Redundancy

Modern cloud data centers implement redundancy at multiple levels:

Hardware Redundancy

Geographic Redundancy:

Data centers distributed across multiple regions

Mitigates regional outages from natural disasters, power grid failures

Data typically replicated across regions

Server Redundancy:

Servers deployed in clusters with automatic failover

If one server fails, another takes over seamlessly

Storage Redundancy:

Data replicated across multiple devices and technologies

RAID configurations protect against disk failures

Network Redundancy

Server-level Redundancy:

Redundant Network Interface Cards (NICs)

Dual or more power supplies

Network-level Redundancy:

Redundant switches, routers, firewalls, load balancers

Link and Path-level Redundancy:

Link aggregation (multiple links between devices)

Spanning Tree Protocol to prevent network loops

Load balancing across multiple paths

Network topologies designed for redundancy:

Hierarchical/3-tier topology

Fat-tree/clos topology

Power Redundancy

Multiple power feeds from different utility substations

Uninterruptible Power Supplies (UPS) for temporary outages

Backup generators for medium/long-term outages

Power Distribution Units with dual inputs

Cooling Redundancy

N+1 configuration (one extra cooling unit than required)

Multiple cooling technologies

Redundant cooling loops (pipes, heat exchangers, pumps)

Hot/cold aisle containment

Redundancy Challenges

Cost: Redundant systems require additional hardware and management

Complexity: More components mean more potential failure points

Consistency: Maintaining consistent state across replicas

Testing: Verifying redundancy actually works as expected

Link to original

Modern Cloud Architectures - Scalability
Scaling Fundamentals

Scaling is the process of adding or removing resources to match workload demand. In cloud architectures, two primary scaling approaches are used:

Vertical Scaling (Scaling Up)

Definition: Increasing the performance of a single node by adding more resources (CPU cores, memory, etc.)

Advantages:

Good speedup up to a particular point

No application architecture changes required

Simpler to implement

Disadvantages:

Beyond a certain point, speedup becomes very expensive

Limited by hardware capabilities

Single point of failure remains

Potential downtime during scaling operations

Horizontal Scaling (Scaling Out)

Definition: Increasing the number of nodes in the system

Advantages:

Cost-effective way to grow total resources

Better fault tolerance through redundancy

Virtually unlimited scaling potential

Disadvantages:

Requires coordination systems and load balancing

Application must be designed for distributed operation

More complex to efficiently utilize resources

Why Horizontal Scaling Dominates Cloud Architectures

Hardware Trend: CPUs are not getting substantially faster as they used to

Economic Factor: Large sets of inexpensive commodity servers are more cost-effective

Failure Reality: All hardware eventually fails

Virtualization Advantage: VMs and containers make it easy to replicate services across nodes

Dynamic Scaling Architecture

Modern cloud systems implement dynamic scaling to automatically adjust resources:

Monitoring: Track metrics like CPU usage, memory usage, request rates

Thresholds: Define conditions that trigger scaling actions

Scaling Actions: Add/remove resources when thresholds are crossed

Stabilization: Implement cooldown periods to prevent oscillation

Example Process Flow:

Consumers send more requests to a service

Existing resources become overloaded, timeouts occur

Auto-scaling detects the condition and deploys additional resources

Traffic is redistributed across all available resources

Scaling and State

Scaling approaches differ based on whether components are stateless or stateful:

Stateless Components

Definition: Maintain no internal state beyond processing a single request

Examples: Web servers with static content, DNS servers, mathematical calculation services

Scaling Approach: Simply create more instances and distribute requests via load balancing

Stateful Components

Definition: Maintain state beyond a single request (prior state is required to process future requests)

Examples: Database servers, mail servers, stateful web servers, session management

Scaling Approach: More complex, typically requires partitioning and/or replication

Stateless Load Balancing

DNS-Level Load Balancing

Implementation: DNS servers resolve domain names to different IP addresses

Advantages: Simple, cost-effective, can use geographical location

Disadvantages: Slow to react to failures due to DNS caching, limited health checks

IP-Level Load Balancing

Implementation: Routers direct clients to different locations using IP anycast

Advantages: Relatively simple, faster response to failures

Disadvantages: Less granular, assumes all requests create equal load

Application-Level Load Balancing

Implementation: Dedicated load balancer acting as a front end

Advantages: Granular control, content-based routing, SSL offloading

Disadvantages: Increased complexity, performance overhead, higher latency

Stateful Scaling

Scaling stateful services presents unique challenges:

Partitioning (Sharding)

Definition: Dividing data into distinct, independent parts

Purpose: Improves scalability (performance), but not availability

Key Consideration: Each data item is stored in only one partition

Partitioning Schemes:

Per-Tenant Partitioning

Put different tenants on different machines

Good isolation and scalability

Challenging when a tenant grows beyond one machine

Horizontal Sharding

Split table by rows across different servers

Each shard has same schema but contains subset of rows

Easy to scale out, reduces indices

Examples: Google BigTable, MongoDB

Vertical Partitioning

Split table by columns, grouping related columns

Improves performance for specific queries

Doesn’t inherently support scaling across multiple servers

Distribution Strategies:

Range Partitioning

Related data stored together

Efficient for range queries

Poor load balancing, requires manual adjustment

Hash Partitioning

Uniform distribution

Good load balancing

Inefficient for range queries

Requires reorganization when number of partitions changes

Link to original

Modern Cloud Architectures - Microservices
Evolution from Monolith to Microservices

Traditional monolithic applications face challenges as they grow:

Increasingly difficult to maintain

Hard to scale specific components

Complex to evolve with changing requirements

Technology lock-in

Microservices architecture emerged as a solution to these challenges.

What Are Microservices?

Microservices architecture is an approach to develop a single application as a suite of small services, each:

Running in its own process

Communicating through lightweight mechanisms (often HTTP/REST APIs)

Independently deployable

Built around business capabilities

Potentially implemented using different technologies

Key Characteristics of Microservices

Loose coupling: Services interact through well-defined interfaces

Independent deployment: Each service can be deployed without affecting others

Technology diversity: Different services can use different technologies

Focused on business capabilities: Services aligned with business domains

Small size: Each service focuses on doing one thing well

Decentralized data management: Each service manages its own data

Automated deployment: CI/CD pipelines for each service

Designed for failure: Resilience built in through isolation

Microservices Architecture Components

A typical microservices architecture includes:

Core Services: Implement business functionality

API Gateway: Provides a single entry point for clients

Service Registry: Keeps track of service instances and locations

Config Server: Centralized configuration management

Monitoring and Tracing: Distributed system observability

Load Balancer: Distributes traffic among service instances

Advantages of Microservices

Independent Development:

Teams can work on different services simultaneously

Faster development cycles

Smaller codebases are easier to understand

Technology Flexibility:

Each service can use the most appropriate tech stack

Easier to adopt new technologies incrementally

Scalability:

Services can be scaled independently based on demand

More efficient resource utilization

Fault Isolation:

Failures in one service don’t necessarily affect others

Easier to implement resilience patterns

Maintainability:

Smaller codebases are less complex

Easier to understand and debug

New team members can become productive faster

Reusability:

Services can be reused in different contexts

Example: Netflix Asgard, Eureka services used in multiple projects

Disadvantages of Microservices

Complexity:

Increased operational overhead with more services to manage and monitor

Distributed debugging challenges - tracing issues across multiple services

Complexity of service interactions and dependencies

Performance Overhead:

Latency due to network communication between services

Serialization/deserialization costs

Network bandwidth consumption

Operational Challenges:

Microservice sprawl - could expand to hundreds or thousands of services

Managing CI/CD pipelines for multiple services

End-to-end testing becomes more difficult

Failure Patterns:

Interdependency chains can cause cascading failures

Death spirals (failures in containers of the same service)

Retry storms (wasted resources on failed calls)

Cascading QoS violations due to bottleneck services

Failure recovery potentially slower than in monoliths

Microservice Communication

Synchronous Communication

REST APIs (HTTP/HTTPS): Simple request-response pattern

gRPC: Efficient binary protocol with bidirectional streaming

GraphQL: Query-based, client specifies exactly what data it needs

Pros:

Immediate response

Simpler to implement

Easier to debug

Cons:

Tight coupling

Higher latency

Lower fault tolerance

Asynchronous Communication

Message queues: RabbitMQ, ActiveMQ

Event streaming: Apache Kafka, AWS Kinesis

Pub/Sub pattern: Google Cloud Pub/Sub

Pros:

Loose coupling

Better scalability

Higher fault tolerance

Cons:

More complex to implement

Harder to debug

Eventually consistent

Glueware and Support Infrastructure

Microservices require substantial supporting infrastructure (“glueware”) that often outweighs the core services:

Monitoring and logging systems

Service discovery mechanisms

Load balancing services

API gateways

Message brokers

Circuit breakers for resilience

Distributed tracing tools

Configuration management

According to the Cloud Native Computing Foundation’s 2022 survey, glueware now outweighs core microservices in most deployments.

Avoiding Microservice Sprawl

To prevent excessive complexity with microservices:

Start with a monolith design

Gradually break it down into microservices as needed

Identify natural boundaries and avoid over-decomposition

Focus on business capabilities

Design around clear business purposes rather than technical functions

Establish clear governance

Define guidelines and best practices for microservice development

Create standards for naming conventions, communication protocols, etc.

Implement fault-tolerant design patterns

Timeouts, bounded retries, circuit breakers

Graceful degradation

Link to original

Quartz 4

Explorer

Modern Cloud Architectures

Architectural Foundations

Architectural Concepts

Layering

Tiering

Monolith vs. Distributed Architecture

Monolithic Architecture

Distributed Architecture

Practical Application Guidelines

Modern Cloud Architectures - Redundancy

Why Use Redundancy?

Importance of Fault Models

Redundancy by Replication

Data Replication

Service Replication

Infrastructure Redundancy

Hardware Redundancy

Network Redundancy

Power Redundancy

Cooling Redundancy

Redundancy Challenges

Modern Cloud Architectures - Scalability

Scaling Fundamentals

Vertical Scaling (Scaling Up)

Horizontal Scaling (Scaling Out)

Why Horizontal Scaling Dominates Cloud Architectures

Dynamic Scaling Architecture

Example Process Flow:

Scaling and State

Stateless Components

Stateful Components

Stateless Load Balancing

DNS-Level Load Balancing

IP-Level Load Balancing

Application-Level Load Balancing

Stateful Scaling

Partitioning (Sharding)

Partitioning Schemes:

Distribution Strategies:

Modern Cloud Architectures - Microservices

Evolution from Monolith to Microservices

What Are Microservices?

Key Characteristics of Microservices

Microservices Architecture Components

Advantages of Microservices

Disadvantages of Microservices

Microservice Communication

Synchronous Communication

Asynchronous Communication

Glueware and Support Infrastructure

Avoiding Microservice Sprawl

Graph View

Table of Contents

Backlinks