Scaling Fundamentals
Scaling is the process of adding or removing resources to match workload demand. In cloud architectures, two primary scaling approaches are used:
Vertical Scaling (Scaling Up)
- Definition: Increasing the performance of a single node by adding more resources (CPU cores, memory, etc.)
- Advantages:
- Good speedup up to a particular point
- No application architecture changes required
- Simpler to implement
- Disadvantages:
- Beyond a certain point, speedup becomes very expensive
- Limited by hardware capabilities
- Single point of failure remains
- Potential downtime during scaling operations
Horizontal Scaling (Scaling Out)
- Definition: Increasing the number of nodes in the system
- Advantages:
- Cost-effective way to grow total resources
- Better fault tolerance through redundancy
- Virtually unlimited scaling potential
- Disadvantages:
- Requires coordination systems and load balancing
- Application must be designed for distributed operation
- More complex to efficiently utilize resources
Why Horizontal Scaling Dominates Cloud Architectures
- Hardware Trend: CPUs are not getting substantially faster as they used to
- Economic Factor: Large sets of inexpensive commodity servers are more cost-effective
- Failure Reality: All hardware eventually fails
- Virtualization Advantage: VMs and containers make it easy to replicate services across nodes
Dynamic Scaling Architecture
Modern cloud systems implement dynamic scaling to automatically adjust resources:
- Monitoring: Track metrics like CPU usage, memory usage, request rates
- Thresholds: Define conditions that trigger scaling actions
- Scaling Actions: Add/remove resources when thresholds are crossed
- Stabilization: Implement cooldown periods to prevent oscillation
Example Process Flow:
- Consumers send more requests to a service
- Existing resources become overloaded, timeouts occur
- Auto-scaling detects the condition and deploys additional resources
- Traffic is redistributed across all available resources
Scaling and State
Scaling approaches differ based on whether components are stateless or stateful:
Stateless Components
- Definition: Maintain no internal state beyond processing a single request
- Examples: Web servers with static content, DNS servers, mathematical calculation services
- Scaling Approach: Simply create more instances and distribute requests via load balancing
Stateful Components
- Definition: Maintain state beyond a single request (prior state is required to process future requests)
- Examples: Database servers, mail servers, stateful web servers, session management
- Scaling Approach: More complex, typically requires partitioning and/or replication
Stateless Load Balancing
DNS-Level Load Balancing
- Implementation: DNS servers resolve domain names to different IP addresses
- Advantages: Simple, cost-effective, can use geographical location
- Disadvantages: Slow to react to failures due to DNS caching, limited health checks
IP-Level Load Balancing
- Implementation: Routers direct clients to different locations using IP anycast
- Advantages: Relatively simple, faster response to failures
- Disadvantages: Less granular, assumes all requests create equal load
Application-Level Load Balancing
- Implementation: Dedicated load balancer acting as a front end
- Advantages: Granular control, content-based routing, SSL offloading
- Disadvantages: Increased complexity, performance overhead, higher latency
Stateful Scaling
Scaling stateful services presents unique challenges:
Partitioning (Sharding)
- Definition: Dividing data into distinct, independent parts
- Purpose: Improves scalability (performance), but not availability
- Key Consideration: Each data item is stored in only one partition
Partitioning Schemes:
-
Per-Tenant Partitioning
- Put different tenants on different machines
- Good isolation and scalability
- Challenging when a tenant grows beyond one machine
-
Horizontal Sharding
- Split table by rows across different servers
- Each shard has same schema but contains subset of rows
- Easy to scale out, reduces indices
- Examples: Google BigTable, MongoDB
-
Vertical Partitioning
- Split table by columns, grouping related columns
- Improves performance for specific queries
- Doesn’t inherently support scaling across multiple servers
Distribution Strategies:
-
Range Partitioning
- Related data stored together
- Efficient for range queries
- Poor load balancing, requires manual adjustment
-
Hash Partitioning
- Uniform distribution
- Good load balancing
- Inefficient for range queries
- Requires reorganization when number of partitions changes