Modern Cloud Architectures

Scaling Fundamentals

Scaling is the process of adding or removing resources to match workload demand. In cloud architectures, two primary scaling approaches are used:

Vertical Scaling (Scaling Up)

Definition: Increasing the performance of a single node by adding more resources (CPU cores, memory, etc.)
Advantages:
- Good speedup up to a particular point
- No application architecture changes required
- Simpler to implement
Disadvantages:
- Beyond a certain point, speedup becomes very expensive
- Limited by hardware capabilities
- Single point of failure remains
- Potential downtime during scaling operations

Horizontal Scaling (Scaling Out)

Definition: Increasing the number of nodes in the system
Advantages:
- Cost-effective way to grow total resources
- Better fault tolerance through redundancy
- Virtually unlimited scaling potential
Disadvantages:
- Requires coordination systems and load balancing
- Application must be designed for distributed operation
- More complex to efficiently utilize resources

Why Horizontal Scaling Dominates Cloud Architectures

Hardware Trend: CPUs are not getting substantially faster as they used to
Economic Factor: Large sets of inexpensive commodity servers are more cost-effective
Failure Reality: All hardware eventually fails
Virtualization Advantage: VMs and containers make it easy to replicate services across nodes

Dynamic Scaling Architecture

Modern cloud systems implement dynamic scaling to automatically adjust resources:

Monitoring: Track metrics like CPU usage, memory usage, request rates
Thresholds: Define conditions that trigger scaling actions
Scaling Actions: Add/remove resources when thresholds are crossed
Stabilization: Implement cooldown periods to prevent oscillation

Example Process Flow:

Consumers send more requests to a service
Existing resources become overloaded, timeouts occur
Auto-scaling detects the condition and deploys additional resources
Traffic is redistributed across all available resources

Scaling and State

Scaling approaches differ based on whether components are stateless or stateful:

Stateless Components

Definition: Maintain no internal state beyond processing a single request
Examples: Web servers with static content, DNS servers, mathematical calculation services
Scaling Approach: Simply create more instances and distribute requests via load balancing

Stateful Components

Definition: Maintain state beyond a single request (prior state is required to process future requests)
Examples: Database servers, mail servers, stateful web servers, session management
Scaling Approach: More complex, typically requires partitioning and/or replication

Stateless Load Balancing

DNS-Level Load Balancing

Implementation: DNS servers resolve domain names to different IP addresses
Advantages: Simple, cost-effective, can use geographical location
Disadvantages: Slow to react to failures due to DNS caching, limited health checks

IP-Level Load Balancing

Implementation: Routers direct clients to different locations using IP anycast
Advantages: Relatively simple, faster response to failures
Disadvantages: Less granular, assumes all requests create equal load

Application-Level Load Balancing

Implementation: Dedicated load balancer acting as a front end
Advantages: Granular control, content-based routing, SSL offloading
Disadvantages: Increased complexity, performance overhead, higher latency

Stateful Scaling

Scaling stateful services presents unique challenges:

Partitioning (Sharding)

Definition: Dividing data into distinct, independent parts
Purpose: Improves scalability (performance), but not availability
Key Consideration: Each data item is stored in only one partition

Partitioning Schemes:

Per-Tenant Partitioning
- Put different tenants on different machines
- Good isolation and scalability
- Challenging when a tenant grows beyond one machine
Horizontal Sharding
- Split table by rows across different servers
- Each shard has same schema but contains subset of rows
- Easy to scale out, reduces indices
- Examples: Google BigTable, MongoDB
Vertical Partitioning
- Split table by columns, grouping related columns
- Improves performance for specific queries
- Doesn’t inherently support scaling across multiple servers

Distribution Strategies:

Range Partitioning
- Related data stored together
- Efficient for range queries
- Poor load balancing, requires manual adjustment
Hash Partitioning
- Uniform distribution
- Good load balancing
- Inefficient for range queries
- Requires reorganization when number of partitions changes

Quartz 4

Explorer

Modern Cloud Architectures - Scalability