Virtual Machines and Virtualization

Question 1

Question

Briefly describe what critical instructions are and why they presented a challenge for x86 system virtualization [2]

Answer

Critical instructions are sensitive instructions that are not privileged [1]. Critical instructions don’t trap, so don’t pass control to a hypervisor, and would then not have the expected behavior for guest OS code [1].

Question 2

Question

Briefly summarize why the physical main memory can simply be partitioned for Xen guests [3]

Answer

Guests are aware of running on a hypervisor, using only parts of the memory [1]. Large partitions of memory (~ GBs) are typically allocated to each of a few VMs [1]. Memory addresses for guest processes remain logical/virtual (preserving virtual memory benefits like paging) [1].

Question 3

Question

Explain the key difference between shadow page tables used in full virtualization and the memory management approach used in Xen. [3]

Answer

Shadow page tables require hypervisor-maintained duplicate tables combining guest virtual-to-physical and physical-to-machine mappings [1]. Xen lets guests maintain their own page tables [1], but hypervisor validates mappings for allocated memory only [1].

Question 4

Question

Which statement about hardware-assisted virtualization is correct?
a) Modifying the guest OS
b) Binary translation for critical instructions
c) Introduces CPU modes specifically for virtualization
d) Incompatible with legacy OS

Answer

c) It introduces new CPU modes specifically for virtualization [1]. Hardware-assisted virtualization (Intel VT-x, AMD-V) introduces root/non-root modes for guest OS operation and hypervisor control [1].

Containers and Container Management

Question 5

Question

Briefly explain what the chroot system call on Linux does and how it is useful for containerization [2]

Answer

Chroot sets a directory as the new root for processes [1], enabling containerization by isolating binaries, libraries, configurations, etc. [1].

Question 6

Question

Compare and contrast namespaces and cgroups in Linux containment. [4]

Answer

Namespaces isolate process views (PID, network, mount points) [1]; cgroups manage resource use (CPU, memory, I/O) [1]. Namespaces provide separate environments [1]; cgroups enforce resource limits/accounting [1].

Question 7

Question

Why are container images typically smaller than VM images? Give two reasons. [2]

Answer

Container images exclude the OS kernel, including only apps/dependencies [1], and share host OS kernel [1].

Question 8

Question

Explain the relationship between Dockerfile, image, and container. [3]

Answer

Dockerfile contains build instructions [1]. An image is a read-only template [1]. A container is a running instance of an image [1].

Cloud Infrastructure Management

Question 9

Question

Briefly explain how Infrastructure-as-Code addresses snowflake servers [2]

Answer

IaC captures server configs in versioned code [1], removing undocumented manual changes [1].

Question 10

Question

Explain the difference between continuous delivery and continuous deployment. [2]

Answer

Continuous delivery auto-tests/prepares releases requiring manual approval [1]. Continuous deployment auto-deploys to production if tests pass [1].

Question 11

Question

What is the primary purpose of live VM migration, and what components must migrate? [3]

Answer

Live VM migration moves running VMs between hosts minimizing downtime [1]. Components: memory pages [1], network connections, storage resources [1].

Question 12

Question

List and briefly explain stages of Xen live migration. [4]

Answer

Stage 0: VM active on source [1]

Stages 1-2: Reservation, iterative memory pre-copy [1]

Stage 3: Brief stop-copy phase [1]

Stages 4-5: Commitment/activation on destination [1]

Question 13

Question

Name/explain an issue with PUE metric. [1]

Answer

Inconsistent methodologies or ignores whole-system trade-offs/energy efficiency [1].

Question 14

Question

Explain energy-proportional computing and its importance for cloud data centers. [3]

Answer

Energy use proportional to utilization [1]; important as servers often underutilized (10-50%) [1] but consume significant idle power [1].

Cloud Sustainability

Question 15

Question

What is the difference between embodied and operational carbon emissions in cloud computing? [2]

Answer

Embodied emissions result from manufacturing, transporting, and disposing of hardware [1]; operational emissions come from electricity used during operation [1].

Question 16

Question

What is carbon-aware computing and how does it differ from energy efficiency? [3]

Answer

Carbon-aware computing schedules tasks based on electricity carbon intensity [1]. It differs from energy efficiency by focusing on when/where energy is used, rather than just reducing total consumption [1].

Cloud System Design

Question 17

Question

You are designing a cloud-based ML system with training and inference components. Why deploy the inference service at the edge rather than the cloud? [2]

Answer

Deploying at the edge reduces latency by processing data closer to users [1], avoiding round-trip delays to cloud servers [1].

Question 18

Question

Match scenarios to failover strategies (Active-Active/Active-Passive), justify.
a) Financial trading platform
b) Content management system

Answer

a) Active-Active, as it cannot tolerate downtime [2].
b) Active-Passive, acceptable downtime, cost-effective [2].

Question 19

Question

Explain the difference between availability and reliability in cloud systems. [2]

Answer

Availability refers to service readiness (uptime) [1]; reliability refers to system correctness and stability over time (MTBF) [1].

Question 20

Question

What does “five nines” availability mean, and how much downtime does it represent annually? [2]

Answer

“Five nines” (99.999%) represents approximately 5.26 minutes of downtime per year [2].

Modern Cloud Architectures

Question 21

Question

A financial tech company processes high daily transaction volumes. Choose and explain the best architecture:
a) Single high-memory server
b) Load-balanced servers, DB partitioning
c) Serverless functions, single DB
d) Monolithic app, local caching

Answer

b) Load-balanced servers with DB partitioning, offering scalability, performance, and reliability [2].

Question 22

Question

Differentiate horizontal and vertical scaling, give examples. [4]

Answer

Horizontal scaling adds machines (stateless web apps) [2]; vertical scaling upgrades resources on existing machines (databases benefiting from CPU/memory) [2].

Question 23

Question

What are microservices and two advantages over monoliths? [3]

Answer

Microservices: small independent services via APIs [1]. Advantages: independent deployment [1], improved fault isolation [1].

Question 24

Question

What is a service mesh, and what microservices problem does it solve? [2]

Answer

Service mesh manages service communication [1], addressing microservices monitoring, security, reliability issues outside application code [1].

Question 25

Question

Select appropriate service model (IaaS, PaaS, SaaS, FaaS) for scenarios:
a) Startup without infrastructure management
b) Company collaboration tools
c) Research simulations computing power
d) Web app developer avoiding server runtime

Answer

a) FaaS [1]
b) SaaS [1]
c) IaaS [1]
d) PaaS [1]

Extra

Question 26

What are the key disadvantages of the microservices architecture?

Answer

The key disadvantages of microservices include:

Increased complexity (operational overhead, distributed debugging)

Higher latency due to communication between services

Microservice sprawl (potentially ballooning into hundreds or thousands of services)

Operational overhead managing multiple CI/CD pipelines

Interdependency chains can cause cascading failures, death spirals, and retry storms

Failures in one service can trigger failures in dependent services

Failure recovery could take longer than with monoliths

Increased glueware requirements for monitoring, consistency, and coordination

Question 27

Explain the concept of "trap and emulate" in virtualization and when it can be used.

Answer

“Trap and emulate” is a virtualization technique where:

When a guest OS executes a privileged instruction, it causes a trap (exception)

Control is transferred to the VMM/hypervisor

The hypervisor emulates the behavior of the instruction

Execution returns to the guest OS

According to Popek and Goldberg’s theorem, this technique only works efficiently when all sensitive instructions are also privileged instructions. For x86 architectures, this doesn’t hold true as they contain critical instructions (sensitive but not privileged), which is why binary translation or hardware extensions are needed for efficient virtualization.

Question 28

What are the five essential characteristics of cloud computing according to the NIST definition?

Answer

The five essential characteristics of cloud computing according to NIST are:

On-demand self-service: Resources can be provisioned without human interaction

Broad network access: Services accessible via standard network mechanisms

Resource pooling: Provider resources are pooled and dynamically assigned to consumers

Rapid elasticity: Resources can be quickly provisioned and released to scale with demand

Measured service: Resource usage is monitored, controlled, and reported transparently

Question 29

Compare and contrast the three most common approaches to virtualization on x86 architectures.

Answer

The three approaches to x86 virtualization are:

Full Virtualization with Binary Translation:

No modified guest OS needed

No hardware support required

Uses binary translation to handle critical instructions

Uses shadow page tables for memory management

Less efficient for I/O-intensive applications

OS-Assisted Virtualization (Paravirtualization):

Requires modified guest OS

No hardware support required

Better performance through guest OS cooperation

Example: Xen

Limited compatibility with proprietary OSes

Hardware-Assisted Virtualization:

No modified guest OS needed

Requires hardware support (Intel VT-x, AMD-V)

Uses new CPU modes and extended page tables

Good performance for unmodified guests

Specialized hardware required

Question 30

Explain the concept of energy proportionality in data centers and why it's important for cloud sustainability.

Answer

Energy proportionality refers to the goal that a computing system’s energy consumption should be proportional to its workload - ideally, a system’s energy consumption per operation would be independent of utilization level. In a perfectly energy-proportional system, a server at 50% utilization would consume exactly 50% of the power it consumes at 100% utilization.

This is important for cloud sustainability because:

Data center servers are often not fully utilized (average utilization is typically 30-50%)

Non-proportional systems waste energy when underutilized

Energy-proportional systems can significantly reduce overall power consumption and carbon emissions

Achieving energy proportionality requires optimizations at multiple levels: hardware design, system software, workload scheduling, and data center architecture

Question 31

What is autoscaling in Kubernetes and how does the Horizontal Pod Autoscaler calculate the desired number of replicas?

Answer

Autoscaling in Kubernetes automatically adjusts the number of pod replicas based on observed metrics. The Horizontal Pod Autoscaler (HPA) is a Kubernetes component that automatically scales the number of pods in a deployment or replica set.

The HPA calculates desired replicas using this formula:

desiredReplicas = ⌈currentReplicas * (currentMetricValue/desiredMetricValue)⌉

This assumes linear scaling between resource usage and replica count. The autoscaler also:

Only scales if metrics are outside a tolerance (typically 0.1 or 10%)

Scales to the highest number of desired replicas observed in a sliding window (5 minutes)

Ignores pods being shut down in calculations

Handles missing metrics by assuming 0 for scale-out and 1 for scale-in

Assumes metric value of 0 for pods not yet ready

Question 32

Explain the carbon intensity concept and how it's used in carbon-aware computing.

Answer

Carbon intensity refers to the amount of equivalent CO₂ emissions released per unit of generated power (measured in gCO₂/kWh). It varies based on:

Energy source: renewables have lower CI, fossil fuels have high CI

Geographic region: different regions have different energy mixes

Time of day/year: CI varies with changes in supply and demand

In carbon-aware computing, carbon intensity is used to:

Make time-shifting decisions (scheduling workloads during low-carbon periods)

Make location-shifting decisions (running workloads in regions with cleaner energy)

Implement carbon-aware load balancing between data centers

Evaluate the environmental impact of computing operations

There are two types of carbon intensity signals:

Average CI: the overall emissions of the electricity mix (useful for reporting)

Marginal CI: emissions from the next unit of electricity (better for real-time decisions)

Question 33

What are shadow page tables in virtualization and why are they important?

Answer

Shadow page tables are a memory management technique used in full virtualization to handle virtual to physical address translation efficiently. They work as follows:

The guest OS maintains its own page tables mapping virtual to “physical” addresses (which are actually still virtual from the host perspective)

The VMM/hypervisor maintains shadow page tables that map guest virtual addresses directly to host physical addresses

The shadow page tables are what the hardware MMU actually uses

When the guest modifies its page tables, these changes trap to the hypervisor which updates the shadow page tables accordingly

Shadow page tables are important because they:

Avoid the performance penalty of nested address translation

Allow the TLB to cache translations effectively

Enable the guest OS to believe it has direct control over memory mapping

Were essential before hardware virtualization extensions introduced nested/extended page tables

Question 34

Describe the blue/green deployment strategy and its advantages.

Answer

Blue/green deployment is a continuous deployment strategy using two identical production environments:

At any time, one environment (blue or green) is active and receiving all production traffic

New versions are deployed to the inactive environment

After testing the new version in the inactive environment, traffic is switched over

The old active environment becomes inactive, ready for the next deployment

Advantages include:

Zero downtime during deployments

Simple and fast rollback capability (just switch traffic back)

Reduced risk as the new version is fully tested before receiving traffic

Complete testing in a production-identical environment

No in-place upgrades that could lead to subtle configuration issues

Deployment and release are decoupled (deploy first, release later)

Question 35

What is Jevons' Paradox and how does it apply to energy efficiency in cloud computing?

Answer

Jevons’ Paradox states that as technology makes resource use more efficient, the demand for that resource tends to increase, potentially leading to higher overall resource consumption rather than savings. Originally observed by William Stanley Jevons in 1865 for coal consumption after the introduction of more efficient steam engines.

In cloud computing, this manifests as:

More efficient servers lead to lower costs per computation

Lower costs drive increased demand for cloud services

The total energy consumption and carbon footprint continue to grow despite efficiency improvements

Datacenters become more energy-efficient (better PUE) but total energy usage increases

More efficient infrastructure enables more demanding applications (AI, ML, etc.)

This paradox highlights that efficiency alone is insufficient for sustainability - we also need absolute reductions in resource consumption and carbon-aware approaches.

Question 36

What is Infrastructure as Code (IaC) and how does it address the challenges of "configuration drift" and "snowflake servers"?

Answer

Infrastructure as Code (IaC) is the practice of managing and provisioning infrastructure through machine-readable definition files rather than manual processes. It treats infrastructure configuration like software code that can be versioned, tested, and automated.

IaC addresses:

Configuration Drift: When configurations change over time without documentation

IaC provides a single source of truth for infrastructure configuration

Changes must be made to the code, creating an audit trail

Automated deployment ensures consistency between environments

Snowflake Servers: Servers with unique, unreproducible configurations

IaC enables reproducible environment creation from code

Instead of modifying existing servers, new ones are created from definitions

“Immutable infrastructure” approach: rebuild rather than update

Easy to mirror production environments for testing

Clear separation between deliberate configuration and defaults

Question 37

Explain the concept of Continuous Integration/Continuous Delivery (CI/CD) in cloud environments and list its key practices.

Answer

Continuous Integration/Continuous Delivery (CI/CD) is a set of practices for automating software delivery processes. In cloud environments, CI/CD involves:

Continuous Integration:

One common versioned code repository

Build automation with short build times

Self-testing builds

Regular code commits

Building every commit (often on a CI server)

Making test results visible to everyone

Maintaining similar test and production environments

Continuous Delivery/Deployment:

Automated release process

Fast and reproducible software releases

Short cycle time (time from code change to production)

Automated testing (unit, integration, acceptance)

Deployment to staging environments

Monitoring and smoke tests

The key difference between Continuous Delivery and Continuous Deployment is that Delivery requires manual approval before production deployment, while Deployment automatically pushes changes to production after passing tests.

Question 38

What are the differences between Average Carbon Intensity and Marginal Carbon Intensity when measuring the environmental impact of computing?

Answer

Average Carbon Intensity:

Measures the overall carbon emissions of the electricity mix

Calculated as total emissions divided by total energy produced

Useful for general emissions reporting and long-term analysis

Doesn’t reflect real-time fluctuations in energy sources

Good for annual sustainability reporting and policy planning

Can obscure the impact of specific energy consumption changes

Marginal Carbon Intensity:

Measures the emissions from the next unit of electricity consumed

Represents what happens if demand increases by one unit

Better for real-time decision making and load-shifting strategies

More accurately reflects the immediate impact of energy decisions

Data availability can vary across regions and time periods

Can be complex to predict accurately

Not ideal for long-term infrastructure analysis

For carbon-aware computing, marginal intensity provides better guidance for immediate operational decisions like when to run workloads.

Question 39

What are the three main approaches to achieving fault tolerance in distributed systems, and how do they differ?

Answer

Three main approaches to fault tolerance in distributed systems:

Error Detection:

Monitoring systems collect metrics like CPU, memory, and network usage

Heartbeats provide basic indication of system availability

Telemetry analyzes metrics across servers to identify issues

Circuit breaker pattern detects and prevents cascading failures

Error-correcting code (ECC) memory detects and corrects bit errors

Redundancy/Replication:

Hardware redundancy (servers, storage, network equipment)

Geographic redundancy (distributing across regions)

Data replication to ensure availability despite failures

Component replication (power supplies, cooling systems)

Affects availability according to 1-pⁿ (where p = probability of individual failure)

Failover Strategies:

Active-Passive: Primary system handles all workload with idle standby

Active-Active: Multiple systems simultaneously handle workload

Cold/Warm/Hot standby with different recovery time objectives

State management and consistency mechanisms during failover

Load balancers to redirect traffic during failures

Question 40

Compare and contrast the scaling approaches for stateless vs. stateful components in cloud architectures.

Answer

Scaling Stateless Components:

Maintains no internal state beyond a request

Easy to horizontally scale by adding more instances

Load balancing methods:

DNS-level balancing (simple but slow to react to failures)

IP-level balancing (faster response, basic health checks)

Application-level balancing (granular control, content-based routing)

Instances can be added/removed without concern for state

Examples: web servers with static content, DNS servers

Scaling Stateful Components:

Maintains state beyond a request

More complex to scale horizontally

Requires data partitioning strategies:

Per tenant (isolating different clients)

Horizontal/Sharding (splitting table by rows across servers)

Vertical (splitting table by columns)

Partitioning distribution methods:

Range partitioning (efficient for range queries, poor load balancing)

Hash partitioning (good load balancing, inefficient for range queries)

Requires handling cross-partition queries and maintaining consistency

Examples: databases, stateful web servers, mail servers

Question 41

What are the key components of Kubernetes and how do they work together to manage containerized applications?

Answer

Key Kubernetes components:

Control Plane Components (Master):

API Server: Entry point for all REST commands

etcd: Reliable distributed key-value store for cluster state

Scheduler: Places pods based on resource requirements and constraints

Controllers: Manage state (replication, endpoints, nodes, service accounts)

Node Components:

kubelet: Agent that communicates with the Master

kube-proxy: Makes services available on each node

Container Runtime: Software executing containers (Docker, containerd)

Logical Resources:

Pods: Groups of containers scheduled together on the same node

Deployments: Manage stateless applications

StatefulSets: Manage stateful applications

Services: Expose functionality of pods to the cluster or externally

Horizontal Pod Autoscaler: Automatically scales pods based on metrics

These components work together to:

Deploy and manage containerized applications

Schedule containers based on resource requirements

Monitor container health and restart failed containers

Scale applications based on demand

Provide service discovery and load balancing

Update applications without downtime

Question 42

Describe the lifecycle emissions of datacenter hardware and explain why operational emissions might not be the only environmental concern.

Answer

Datacenter hardware lifecycle emissions:

Embodied Emissions (23% of total):

Raw material extraction and processing

Manufacturing of components (CPUs, memory, storage)

Assembly of hardware

Transportation to datacenter

Can be up to 50% of emissions for consumer devices

Operational Emissions (70% of total):

Electricity for powering computing equipment

Cooling systems energy use

Network infrastructure power consumption

Lighting and auxiliary systems

End-of-life Emissions (5-7% of total):

Recycling processes

E-waste management

Disposal of non-recyclable components

Beyond operational emissions, other environmental concerns include:

Water usage for cooling (particularly in water-scarce regions)

Land use for datacenter construction

Resource depletion of rare earth minerals and metals

Hazardous materials in electronic components

Short hardware lifecycles increasing embodied carbon impact

A holistic approach to datacenter sustainability requires addressing the full lifecycle impact, not just operational energy efficiency.

Question 43

Explain the CAP theorem and its implications for distributed database design in cloud environments.

Answer

The CAP theorem (Brewer’s theorem) states that a distributed database system can only provide two of the following three guarantees simultaneously:

Consistency: All nodes see the same data at the same time

Availability: Every request receives a response (success or failure)

Partition tolerance: System continues to operate despite network partitions

In cloud environments, network partitions are unavoidable, so systems must choose between:

CP systems: Sacrifice availability during partitions to maintain consistency

Examples: Google Spanner, HBase, MongoDB (default config)

Good for banking, financial systems where consistency is critical

AP systems: Sacrifice consistency during partitions to maintain availability

Examples: Amazon DynamoDB, Cassandra, CouchDB

Good for social media, content delivery where availability matters most

Modern distributed databases often implement various consistency models beyond strict consistency:

Strong consistency: All reads reflect the latest write

Eventual consistency: All updates propagate eventually

Causal consistency: Related operations appear in the same order to all observers

Read-your-writes consistency: A user always sees their own updates

Cloud architects must understand these tradeoffs to select appropriate storage solutions based on application requirements.

Question 44

What are the primary differences between private, public, community, and hybrid cloud deployment models?

Answer

Private Cloud:

Used by a single organization

Owned by that organization or a third party

Located on-premise or off-premise

Advantages: Control, security, compliance, customization

Disadvantages: Higher costs, requires IT expertise

Public Cloud:

Available to the general public

Owned by cloud service providers (AWS, Azure, GCP)

Located in provider’s datacenters

Advantages: Cost-effective, scalable, minimal management

Disadvantages: Less control, potential security concerns

Community Cloud:

Shared by organizations with common concerns (e.g., compliance, security)

Owned by participating organizations or third party

Located on-premise or off-premise

Advantages: Cost sharing, compliance, common requirements

Disadvantages: Limited resources compared to public cloud

Hybrid Cloud:

Composition of two or more cloud models (private, community, public)

Bound together by standardized technology for portability

Advantages: Flexibility, workload optimization, cost balancing

Disadvantages: Complexity, integration challenges, skill requirements

Multi-cloud (a related concept):

Using services from multiple public cloud providers

Advantages: Avoiding vendor lock-in, leveraging best services

Disadvantages: Management complexity, potential data transfer costs

Question 45

Describe the Power Usage Effectiveness (PUE) metric, its limitations, and alternative metrics for measuring datacenter efficiency.

Answer

Power Usage Effectiveness (PUE):

Definition: Total facility energy / IT equipment energy

Industry average: ~1.58 (2022)

Best practice target: 1.2 or less

Perfect PUE would be 1.0 (all energy used by computing equipment)

Limitations of PUE:

Inconsistent measurement methodologies

Doesn’t account for energy sources (renewable vs fossil fuels)

Doesn’t measure IT equipment efficiency

Doesn’t capture whole system tradeoffs (e.g., heat reuse)

Can be manipulated by changing measurement timing or boundaries

Doesn’t account for climate differences

Alternative metrics:

Carbon Usage Effectiveness (CUE): Total CO₂ emissions / IT equipment energy

Water Usage Effectiveness (WUE): Datacenter water consumption / IT equipment energy

Energy Reuse Effectiveness (ERE): Measures how much energy is reused outside the datacenter

Green Energy Coefficient (GEC): Percentage of renewable energy used

Performance per Watt: Computing output relative to power consumption

Total Cost of Ownership (TCO): Financial metric incorporating efficiency

Comprehensive assessment requires multiple metrics to capture overall sustainability impact.

Question 46

Explain the concept of cross-cloud computing and the different approaches to implementing it.

Answer

Cross-cloud computing refers to operating seamlessly across multiple cloud environments. It’s implemented through several approaches:

Hybrid Clouds:

Combination of private and public clouds

Connected via dedicated links or VPNs

Allows workload mobility between environments

Often involves some hardwiring between environments

Multi-clouds:

Using services from multiple public cloud providers

Implemented with translation libraries or common programming models

Examples: Terraform, Ansible Cloud Modules, OpenTofu, Pulumi

May lose some provider-specific features due to abstraction

Meta-clouds:

Using a broker layer to abstract multiple clouds

Broker makes decisions about resource allocation

Reduces control but simplifies management

Many commercial and academic proposals exist

Federated Clouds:

Establishing common APIs between cloud providers

Requires standardization of interfaces

Most challenging to implement but potentially most seamless

Motivations for cross-cloud computing:

Avoiding vendor lock-in

Increasing resilience against provider outages

Leveraging different providers’ strengths

Meeting regulatory requirements (data sovereignty)

Geographic coverage optimization

Each approach involves tradeoffs between flexibility, complexity, and provider-specific functionality.

Question 47

How do serverless/Function-as-a-Service (FaaS) platforms work, and what are their advantages and limitations?

Answer

Serverless/FaaS Platforms:

Working mechanism:

Functions are deployed as standalone code units

Execution triggered by events (HTTP requests, database changes, etc.)

Provider dynamically manages resources and scaling

Environment is ephemeral; no persistent local storage

Cold starts occur when new container instances are initialized

Execution is time-limited (typically 5-15 minutes maximum)

Advantages:

Lower costs through precise usage-based billing

No servers to manage, reducing operational complexity

Automatic scaling without configuration

Fast deployment and time-to-market

Focus on business logic rather than infrastructure

“No idle resources” model

Limitations:

Cold start latency impacts response times

Vendor lock-in due to platform-specific services and APIs

Complex state management (stateless execution model)

Memory and execution time constraints

Debugging and monitoring challenges

Limited local testing capabilities

Potential higher costs for constant-load applications

Examples include AWS Lambda, Azure Functions, Google Cloud Functions, Cloudflare Workers, and IBM Cloud Functions.

Cloud Resource Management

Question 48

Question

Answer

Binary translation uses shadow page tables where the hypervisor maintains duplicate tables that map guest virtual addresses directly to host physical addresses [1]. The hypervisor must trap and emulate all page table operations, creating significant overhead [1].

Hardware-assisted virtualization uses Extended/Nested Page Tables (EPT/NPT) that add a hardware layer for two-level address translation (guest virtual → guest physical → host physical) [1]. This eliminates the need for shadow page tables and reduces VMM interventions [1].

Hardware-assisted virtualization is more efficient because it reduces VMM traps for memory operations, provides hardware TLB support for nested translations, and eliminates the memory overhead of maintaining shadow page tables [1].

Question 49

Question

Answer

Docker uses namespaces to provide process isolation by creating separate views of system resources [1]. Examples include:

PID namespace: Isolates process IDs (container processes can’t see host processes)

Network namespace: Provides separate network interfaces, routing tables, and firewall rules

Mount namespace: Isolates filesystem mount points

UTS namespace: Isolates hostname and domain name [1]

Docker uses cgroups to control resource allocation and impose limits [1]. Examples include:

CPU cgroups: Limit CPU usage percentage

Memory cgroups: Restrict memory consumption and swap usage

Block I/O cgroups: Manage disk I/O priorities and limits

Device cgroups: Control access to specific devices [1]

Question 50

Question

Answer

Software-Defined Networking separates the control plane (network intelligence) from the data plane (packet forwarding), centralizing network configuration and management through software controllers [1]. This separation enables programmable network behavior through standardized interfaces like OpenFlow [1].

SDN benefits cloud infrastructure by:

Enabling dynamic network configuration to support rapid VM/container provisioning and migration

Providing network virtualization for multi-tenant isolation

Allowing policy-based routing and traffic engineering for optimal resource utilization [1]

Question 51

Question

Answer

Declarative IaC defines the desired end state of infrastructure without specifying how to achieve it [1]. Example: Terraform/CloudFormation configurations that specify resources and their properties. The tool determines the required actions to reach that state [1].

Imperative IaC uses scripts that explicitly define the sequence of commands needed to create infrastructure [1]. Example: Shell scripts with explicit AWS CLI/Azure CLI commands that create resources in a specific order [1].

Trade-offs:

Declarative: Better idempotency and state management, more self-documenting, handles dependencies automatically, but less flexibility for complex workflows [1]

Imperative: More control over execution sequence, familiar to developers, easier debugging, but more error-prone and harder to maintain as infrastructure grows [1]

Question 52

Question

Answer

Canary deployments gradually roll out changes to a small subset of users before full deployment [1]. The new version is deployed to a small percentage of production servers/users, allowing monitoring of its behavior and performance in real production conditions with limited impact [1].

Advantages over blue/green:

Reduced risk by limiting exposure of new version to a small percentage of users [1]

More granular rollout control (can be increased incrementally)

Lower resource requirements (don’t need full duplicate environment)

Better for detecting performance issues that only appear under real load patterns [1]

Most beneficial for:

High-traffic applications where full-scale errors would affect many users

Applications with unpredictable user behavior patterns

Services with complex dependencies that are difficult to fully test in staging

Applications where performance metrics are critical acceptance criteria [1]

Scalable and Sustainable Architectures

Question 53

You are designing a cloud architecture for a financial application that processes transactions. The application needs to:

Handle high volume of transactions

Maintain strict data consistency

Scale dynamically based on load

Maintain high availability

Choose an appropriate architecture pattern and justify your choice. Discuss any potential limitations and how you might address them. [6]

Answer

A microservices architecture with CQRS (Command Query Responsibility Segregation) pattern would be appropriate [1]. This separates read and write operations, allowing each to be optimized and scaled independently [1].

Architecture components:

API Gateway for request routing and authentication

Command service for write operations with synchronous processing

Event sourcing to maintain transaction history and audit trail

Read replicas optimized for query performance

Distributed caching for frequent queries [1]

Justification:

High transaction volume: Horizontal scaling of microservices

Strict consistency: Synchronous processing for critical write operations

Dynamic scaling: Independent scaling of read/write components

High availability: Regional replication and stateless services [1]

Limitations and mitigations:

Complexity: Implement robust monitoring and service mesh for observability

Eventual consistency for reads: Use versioning or timestamps to detect stale data

Distributed transaction management: Implement saga pattern for transactions spanning multiple services [1]

Database approach:

Primary database (e.g., PostgreSQL) for write operations with ACID guarantees

Read replicas with appropriate isolation levels for query performance [1]

Question 54

Question

Answer

Data gravity refers to the tendency of applications, services, and computing resources to be attracted to and cluster around large data repositories [1]. As data accumulates, it becomes increasingly difficult to move due to transfer costs, bandwidth limitations, and latency considerations [1].

In multi-cloud scenarios, data gravity influences:

Cloud provider selection based on where critical data already resides

Data synchronization strategies between clouds

Application placement to minimize data transfer costs [1]

In edge computing, data gravity drives:

Local processing of data-intensive workloads to avoid transferring raw data

Intelligent data filtering and aggregation before transmission to the cloud

Distributed database designs that keep data close to where it’s generated and consumed [1]

Question 55

Question

Answer

DNS-level load balancing:

Works by returning different IP addresses when clients resolve domain names

Advantages: Simple implementation, works across regions, no additional hardware

Disadvantages: Slow propagation due to DNS caching, limited health checking

Best scenario: Global distribution of static content across multiple regions where rapid failover isn’t critical [2]

IP-level (L4) load balancing:

Works at transport layer (TCP/UDP), routing traffic based on IP address and port

Advantages: High performance, handles millions of connections, simple failure detection

Disadvantages: Limited routing intelligence, no content-based decisions

Best scenario: High-throughput applications like video streaming or large file transfers where connection volume is high but routing logic is simple [2]

Application-level (L7) load balancing:

Works at application layer, routing based on HTTP headers, cookies, URL paths

Advantages: Content-based routing, SSL termination, advanced session persistence

Disadvantages: Higher computational overhead, more complex configuration

Best scenario: Microservices architecture where requests need routing to specific services based on URL paths or API endpoints [2]

Question 56

Question

Answer

The shared responsibility model defines the division of security responsibilities between the cloud provider and the customer [1]. The general principle is that providers are responsible for security “of” the cloud (infrastructure) while customers are responsible for security “in” the cloud (data, applications) [1].

Responsibility division by service model:

IaaS:

Provider: Physical security, host virtualization, network infrastructure

Customer: Operating system, applications, data, identity management, access controls [1]

PaaS:

Provider: Everything in IaaS plus operating system, middleware, runtime

Customer: Applications, data, identity management, access policies [1]

SaaS:

Provider: Nearly everything (infrastructure through application)

Customer: Data classification, user access management, compliance requirements [1]

As you move from IaaS to SaaS, the provider assumes more responsibility, but the customer always retains responsibility for their data and user access [1].

Question 57

Question

Answer

Step 1: Assessment and planning

Analyze the monolith to identify bounded contexts and potential service boundaries

Map dependencies between components

Prioritize services for migration based on business value and complexity [1]

Step 2: Create a cloud foundation

Establish cloud infrastructure using Infrastructure-as-Code

Implement CI/CD pipelines

Set up monitoring and observability tools [1]

Step 3: Implement the strangler pattern

Create an API gateway/facade in front of the monolith

Redirect specific functionality to new microservices

Gradually replace monolith components while maintaining functionality [1]

Step 4: Extract and migrate services incrementally

Begin with stateless, non-critical services

Refactor one bounded context at a time

Use feature flags to control functionality exposure

Run old and new implementations in parallel with A/B testing [1]

Step 5: Data migration and management

Implement data access patterns (CQRS, event sourcing)

Use change data capture for synchronization during transition

Gradually shift to service-specific databases [1]

Cloud Sustainability

Question 58

Question

Answer

Key challenges in measuring application-level carbon footprints:

Limited visibility into physical infrastructure and its energy consumption

Multi-tenancy obscures resource attribution between workloads

Varying carbon intensity across regions and time

Complex supply chain emissions for cloud services [2]

Emerging approaches:

Power API and energy estimation models:

Correlate application metrics (CPU, memory, I/O) with energy consumption

Create mathematical models to estimate energy use from observable metrics

Examples: Cloud Carbon Footprint, Green Algorithms [1]

FinOps-integrated carbon accounting:

Leverage billing data as a proxy for resource utilization

Apply emission factors based on region and service type

Incorporate embodied carbon allocation over hardware lifecycle

Examples: Cloud Carbon Footprint, Microsoft Sustainability Calculator [1]

Question 59

Question

Answer

Jevons paradox states that technological improvements in resource efficiency can lead to increased consumption of that resource rather than decreased use [1]. In cloud computing, more efficient servers and data centers lower the cost of computing, which increases demand and can result in greater total energy consumption despite per-unit efficiency gains [1].

Examples in cloud computing:

More efficient servers enable more demanding applications (AI/ML)

Lower costs increase cloud adoption and workload migration

Higher efficiency leads to larger data centers [1]

Strategies to counteract:

Absolute carbon caps and internal carbon pricing

Direct renewable energy investments tied to expansion

Focus on workload optimization and eliminating idle resources

Education about total consumption impact rather than just efficiency metrics [1]

Question 60

Question

Answer

Carbon offsetting:

Purchasing credits to compensate for emissions (renewable energy certificates, carbon removal projects)

Advantages: Immediate impact, addresses emissions that cannot be eliminated

Limitations: Doesn’t reduce actual emissions, offset quality varies widely, vulnerable to greenwashing, doesn’t drive system-level change [2]

Carbon reduction:

Direct strategies to lower emissions (renewable energy, efficiency improvements, hardware lifecycle extension)

Advantages: Real emissions reduction, drives innovation, provides competitive advantage

Limitations: Higher initial costs, longer implementation timeline, technical challenges, may hit diminishing returns [2]

Key differences:

Offsetting maintains status quo with compensation; reduction changes operational practices

Offsetting is often cheaper short-term; reduction typically more cost-effective long-term

Reduction addresses root causes while offsetting addresses symptoms

Comprehensive strategy requires both approaches, with reduction prioritized [1]

Question 61

A global company operates cloud workloads across multiple regions. Outline a carbon-aware scheduling strategy that would optimize for:

Lowest carbon emissions

Lowest latency for users

Regulatory compliance for data sovereignty

Explain the trade-offs involved and how you would prioritize these requirements. [6]

Answer

Carbon-aware scheduling strategy:

Workload classification:

Time-critical (real-time user interaction)

Time-flexible (batch processing, analytics)

Data-sensitive (contains regulated information) [1]

Region mapping:

Create a matrix of regions with their carbon intensity, latency to user bases, and regulatory compliance status

Use both average and marginal carbon intensity metrics

Update this map regularly with real-time grid data [1]

Decision framework:

For data-sensitive workloads: First filter by compliant regions, then optimize for carbon within latency constraints

For time-critical workloads: Ensure latency requirements are met, then choose lowest-carbon region among candidates

For time-flexible workloads: Implement temporal and spatial shifting based on carbon intensity forecasts [2]

Implementation mechanism:

Use container orchestration (Kubernetes) with custom schedulers

Implement carbon-aware autoscaling policies

Create carbon budgets per service/application [1]

Trade-offs and prioritization:

Data sovereignty is a non-negotiable legal requirement and must be prioritized first

Latency vs. carbon involves business decisions - critical user-facing services prioritize latency

For non-critical workloads, carbon can be prioritized over perfect latency

Consider using carbon budgets to make trade-offs explicit and measurable [1]

Question 62

Question

Answer

Specialized hardware impacts cloud sustainability through:

Arm processors:

Lower power consumption per computational unit

Better performance-per-watt for web servers and containerized applications

Example workload benefit: Microservices with consistent moderate loads [1]

TPUs (Tensor Processing Units):

Optimized for machine learning matrix operations

30-80% better energy efficiency than GPUs for ML workloads

Example workload benefit: Large language model inference and training [1]

FPGAs (Field-Programmable Gate Arrays):

Custom hardware acceleration for specific algorithms

Significant efficiency gains for specialized repetitive tasks

Example workload benefit: Video transcoding, cryptography, and genomic sequencing [1]

Domain-specific accelerators:

Hardware designed for specific functions (networking, storage, security)

Offloads processing from general-purpose CPUs

Example workload benefit: Network packet processing, encryption/decryption [1]

Environmental impacts:

Reduced energy consumption through workload-specific optimization

Potentially smaller data center footprints through higher compute density

Challenge: Specialized hardware may have higher embodied carbon, requiring longer use to achieve net benefits [1]

Quartz 4

Explorer

Cloud Systems Practice Questions

Virtual Machines and Virtualization

Question 1

Question 2

Question 3

Question 4

Containers and Container Management

Question 5

Question 6

Question 7

Question 8

Cloud Infrastructure Management

Question 9

Question 10

Question 11

Question 12

Question 13

Question 14

Cloud Sustainability

Question 15

Question 16

Cloud System Design

Question 17

Question 18

Question 19

Question 20

Modern Cloud Architectures

Question 21

Question 22

Question 23

Question 24

Question 25

Extra

Question 26

Question 27

Question 28

Question 29

Question 30

Question 31

Question 32

Question 33

Question 34

Question 35

Question 36

Question 37

Question 38

Question 39

Question 40

Question 41

Question 42

Question 43

Question 44

Question 45

Question 46

Question 47

Cloud Resource Management

Question 48

Question 49

Question 50

Question 51

Question 52

Scalable and Sustainable Architectures

Question 53

Question 54

Question 55

Question 56

Question 57

Cloud Sustainability

Question 58

Question 59

Question 60

Question 61

Question 62

Graph View

Table of Contents

Backlinks