Scaling Fundamentals

Scaling is the process of adding or removing resources to match workload demand. In cloud architectures, two primary scaling approaches are used:

Vertical Scaling (Scaling Up)

  • Definition: Increasing the performance of a single node by adding more resources (CPU cores, memory, etc.)
  • Advantages:
    • Good speedup up to a particular point
    • No application architecture changes required
    • Simpler to implement
  • Disadvantages:
    • Beyond a certain point, speedup becomes very expensive
    • Limited by hardware capabilities
    • Single point of failure remains
    • Potential downtime during scaling operations

Horizontal Scaling (Scaling Out)

  • Definition: Increasing the number of nodes in the system
  • Advantages:
    • Cost-effective way to grow total resources
    • Better fault tolerance through redundancy
    • Virtually unlimited scaling potential
  • Disadvantages:
    • Requires coordination systems and load balancing
    • Application must be designed for distributed operation
    • More complex to efficiently utilize resources

Why Horizontal Scaling Dominates Cloud Architectures

  • Hardware Trend: CPUs are not getting substantially faster as they used to
  • Economic Factor: Large sets of inexpensive commodity servers are more cost-effective
  • Failure Reality: All hardware eventually fails
  • Virtualization Advantage: VMs and containers make it easy to replicate services across nodes

Dynamic Scaling Architecture

Modern cloud systems implement dynamic scaling to automatically adjust resources:

  1. Monitoring: Track metrics like CPU usage, memory usage, request rates
  2. Thresholds: Define conditions that trigger scaling actions
  3. Scaling Actions: Add/remove resources when thresholds are crossed
  4. Stabilization: Implement cooldown periods to prevent oscillation

Example Process Flow:

  1. Consumers send more requests to a service
  2. Existing resources become overloaded, timeouts occur
  3. Auto-scaling detects the condition and deploys additional resources
  4. Traffic is redistributed across all available resources

Scaling and State

Scaling approaches differ based on whether components are stateless or stateful:

Stateless Components

  • Definition: Maintain no internal state beyond processing a single request
  • Examples: Web servers with static content, DNS servers, mathematical calculation services
  • Scaling Approach: Simply create more instances and distribute requests via load balancing

Stateful Components

  • Definition: Maintain state beyond a single request (prior state is required to process future requests)
  • Examples: Database servers, mail servers, stateful web servers, session management
  • Scaling Approach: More complex, typically requires partitioning and/or replication

Stateless Load Balancing

DNS-Level Load Balancing

  • Implementation: DNS servers resolve domain names to different IP addresses
  • Advantages: Simple, cost-effective, can use geographical location
  • Disadvantages: Slow to react to failures due to DNS caching, limited health checks

IP-Level Load Balancing

  • Implementation: Routers direct clients to different locations using IP anycast
  • Advantages: Relatively simple, faster response to failures
  • Disadvantages: Less granular, assumes all requests create equal load

Application-Level Load Balancing

  • Implementation: Dedicated load balancer acting as a front end
  • Advantages: Granular control, content-based routing, SSL offloading
  • Disadvantages: Increased complexity, performance overhead, higher latency

Stateful Scaling

Scaling stateful services presents unique challenges:

Partitioning (Sharding)

  • Definition: Dividing data into distinct, independent parts
  • Purpose: Improves scalability (performance), but not availability
  • Key Consideration: Each data item is stored in only one partition

Partitioning Schemes:

  1. Per-Tenant Partitioning

    • Put different tenants on different machines
    • Good isolation and scalability
    • Challenging when a tenant grows beyond one machine
  2. Horizontal Sharding

    • Split table by rows across different servers
    • Each shard has same schema but contains subset of rows
    • Easy to scale out, reduces indices
    • Examples: Google BigTable, MongoDB
  3. Vertical Partitioning

    • Split table by columns, grouping related columns
    • Improves performance for specific queries
    • Doesn’t inherently support scaling across multiple servers

Distribution Strategies:

  • Range Partitioning

    • Related data stored together
    • Efficient for range queries
    • Poor load balancing, requires manual adjustment
  • Hash Partitioning

    • Uniform distribution
    • Good load balancing
    • Inefficient for range queries
    • Requires reorganization when number of partitions changes