Cloud Systems Exam Preparation Printable

Exam Preparation

Cloud Systems Exam Topics
This note provides an overview of the key topics covered in the Cloud Systems course (COMPSCI4106/5118) that are relevant for the exam. The exam has a 50% weight of the overall grade and covers both parts of the course evenly.

Exam Format
- Weight: 50% of the overall grade
- Structure: Four questions, 10 marks each
  - Two questions on Part 1 (Cloud Resource Management)
  - Two questions on Part 2 (Scalable and Sustainable Architectures)
- Question Types: Mix of long-form answers and MCQ-style questions
- Duration: 60 minutes for 40 marks
Part 1: Cloud Resource Management

Chapter 1: Introduction to Cloud Computing
- NIST Definition and its Dimensions
  - Five essential characteristics
  - Three service models
  - Four deployment models
- Virtualization Terminology and Categories
  - Process virtualization
  - OS-level virtualization
  - System virtualization
  - Differences between categories
Chapter 2: Virtual Machines
- Categories of Instructions and Virtualizability
  - Privileged vs. sensitive instructions
  - Popek and Goldberg’s theorem
  - Critical instructions in x86
- Full Virtualization (Binary Translation)
  - How binary translation works
  - Shadow page tables
  - Memory management in virtualization
- OS-Assisted Virtualization
  - Xen architecture (domains)
  - CPU virtualization in Xen
  - Memory management in Xen
- Hardware-Assisted Virtualization
  - CPU virtualization extensions
  - Memory virtualization extensions
  - Tagged Translation Lookaside Buffer
Chapter 3: Containers and Container Management
- Linux Kernel Containment Features
  - chroot system call
  - namespaces (PID, network, mount, etc.)
  - cgroups (Control Groups)
  - capabilities
- Docker
  - Images and Dockerfiles
  - Container instances
  - Docker architecture and components
- Containers vs. VMs
  - Performance differences (CPU, memory, network, I/O)
  - Image size and boot time
  - Isolation and security considerations
- Container Orchestration
  - Kubernetes architecture
  - Pods, deployments, services
  - Horizontal Pod Autoscaler
Chapter 4: Cloud Infrastructure Management
- Challenges in Cloud Infrastructure
  - Server sprawl
  - Configuration drift
  - Snowflake servers
- Cloud Operating Systems
  - OpenStack components
  - Virtual networking
  - Software-Defined Networking (SDN)
- VM Management
  - VM snapshots
  - VM migration (cold, warm, live)
  - Live migration process in Xen
- Infrastructure-as-Code and CI/CD
  - Infrastructure definition files
  - Ansible architecture and concepts
  - Continuous delivery vs. continuous deployment
  - Deployment strategies (blue/green, canary)
Chapter 5: Cloud Sustainability
- Emissions Lifecycle
  - Embodied emissions
  - Operational emissions
  - End-of-life emissions
- Energy Efficiency and Proportionality
  - Static vs. dynamic power consumption
  - Energy-proportional computing
  - Koomey’s Law and trends
- Power Usage Effectiveness (PUE)
  - Definition and calculation
  - Industry trends and benchmarks
  - Limitations of PUE
- Carbon Footprint Measurement
  - Greenhouse Gas Protocol scopes
  - Estimation methodologies
  - Cloud Carbon Footprint (CCF)
- Carbon-Aware Computing
  - Time-shifting workloads
  - Location-shifting workloads
  - Carbon intensity signals
Part 2: Scalable and Sustainable Architectures

Chapter 6: Cloud System Design
- Distributed Systems Concepts
  - Fallacies of distributed computing
  - Key aspects of distributed systems
  - Failures, errors, faults, and QoS
- Quality Attributes
  - Dependability
  - Availability
  - Reliability
- High Availability
  - Fault tolerance
  - Error detection
  - Failover strategies (active-active, active-passive)
Chapter 7: Modern Cloud Architectures
- Architectural Approaches
  - Layering and tiering
  - Redundancy by replication
- Cloud Scaling
  - Stateless scaling (load balancing)
  - Stateful scaling (partitioning)
  - Horizontal vs. vertical scaling
- Advanced Architectures
  - Microservices
  - Service Mesh Technologies (SMTs)
  - Cloud-native technologies
  - API Gateways
Chapter 8: Flavours of Cloud
- Provisioning Levels
  - Infrastructure as a Service (IaaS)
  - Platform as a Service (PaaS)
  - Software as a Service (SaaS)
  - Function as a Service (FaaS)
- Deployment Models
  - Public, private, community, hybrid clouds
  - Considerations for choosing models
- Cross-Cloud Computing
  - Hybrid clouds
  - Federated clouds
  - Multi-clouds
  - Meta-clouds
- Computing Continuum
  - Edge computing vs. fog computing
  - Support roles in the continuum
  - Comparison of fog/edge with cloud
Chapter 9: A Wider Lens on Sustainability
- Designing Dependable Data Centres
  - Hardware redundancy
  - Network redundancy
  - Power redundancy
  - Cooling redundancy
- Carbon Footprint Measurement Frameworks
  - GHG Protocol
  - Real-time vs. historical approaches
  - Life Cycle Assessment (LCA)
- Measurement Granularities
  - Software-level
  - Server-level
  - Rack-level
  - Data center-level
  - Network-level
- Carbon Intensity
  - Definition and regional variations
  - Average vs. marginal intensity
  - Carbon intensity signals
- Carbon-Aware Decision Making
  - Vendor decisions
  - User decisions
  - Instance type selection
Exam Preparation Tips
1. Focus on Understanding Concepts:
  - Rather than memorizing details, ensure you understand core concepts and can explain them
  - Be prepared to apply concepts to different scenarios
2. Practice Explanation and Justification:
  - Many questions will ask you to explain your reasoning
  - Practice articulating clear, concise explanations
3. Review Example Questions:
  - Look at the example questions provided in lecture materials
  - Practice answering similar questions within the word limits
4. Connect Related Concepts:
  - Understand how different topics relate to each other
  - Be prepared to discuss trade-offs between different approaches
5. Terminology:
  - Ensure you’re familiar with the correct terminology
  - Be able to define key terms and concepts
6. MCQ Strategy:
  - For MCQ questions, you’ll need to both select the right answer and explain your choice
  - Practice justifying answers in one concise sentence
Link to original
Cloud Systems Practice Questions
Virtual Machines and Virtualization

Question 1

Question

Briefly describe what critical instructions are and why they presented a challenge for x86 system virtualization [2]

Answer

Critical instructions are sensitive instructions that are not privileged [1]. Critical instructions don’t trap, so don’t pass control to a hypervisor, and would then not have the expected behavior for guest OS code [1].

Question 2

Question

Briefly summarize why the physical main memory can simply be partitioned for Xen guests [3]

Answer

Guests are aware of running on a hypervisor, using only parts of the memory [1]. Large partitions of memory (~ GBs) are typically allocated to each of a few VMs [1]. Memory addresses for guest processes remain logical/virtual (preserving virtual memory benefits like paging) [1].

Question 3

Question

Explain the key difference between shadow page tables used in full virtualization and the memory management approach used in Xen. [3]

Answer

Shadow page tables require hypervisor-maintained duplicate tables combining guest virtual-to-physical and physical-to-machine mappings [1]. Xen lets guests maintain their own page tables [1], but hypervisor validates mappings for allocated memory only [1].

Question 4

Question

Which statement about hardware-assisted virtualization is correct?
a) Modifying the guest OS
b) Binary translation for critical instructions
c) Introduces CPU modes specifically for virtualization
d) Incompatible with legacy OS

Answer

c) It introduces new CPU modes specifically for virtualization [1]. Hardware-assisted virtualization (Intel VT-x, AMD-V) introduces root/non-root modes for guest OS operation and hypervisor control [1].

Containers and Container Management

Question 5

Question

Briefly explain what the chroot system call on Linux does and how it is useful for containerization [2]

Answer

Chroot sets a directory as the new root for processes [1], enabling containerization by isolating binaries, libraries, configurations, etc. [1].

Question 6

Question

Compare and contrast namespaces and cgroups in Linux containment. [4]

Answer

Namespaces isolate process views (PID, network, mount points) [1]; cgroups manage resource use (CPU, memory, I/O) [1]. Namespaces provide separate environments [1]; cgroups enforce resource limits/accounting [1].

Question 7

Question

Why are container images typically smaller than VM images? Give two reasons. [2]

Answer

Container images exclude the OS kernel, including only apps/dependencies [1], and share host OS kernel [1].

Question 8

Question

Explain the relationship between Dockerfile, image, and container. [3]

Answer

Dockerfile contains build instructions [1]. An image is a read-only template [1]. A container is a running instance of an image [1].

Cloud Infrastructure Management

Question 9

Question

Briefly explain how Infrastructure-as-Code addresses snowflake servers [2]

Answer

IaC captures server configs in versioned code [1], removing undocumented manual changes [1].

Question 10

Question

Explain the difference between continuous delivery and continuous deployment. [2]

Answer

Continuous delivery auto-tests/prepares releases requiring manual approval [1]. Continuous deployment auto-deploys to production if tests pass [1].

Question 11

Question

What is the primary purpose of live VM migration, and what components must migrate? [3]

Answer

Live VM migration moves running VMs between hosts minimizing downtime [1]. Components: memory pages [1], network connections, storage resources [1].

Question 12

Question

List and briefly explain stages of Xen live migration. [4]
Answer
- Stage 0: VM active on source [1]
- Stages 1-2: Reservation, iterative memory pre-copy [1]
- Stage 3: Brief stop-copy phase [1]
- Stages 4-5: Commitment/activation on destination [1]
Question 13

Question

Name/explain an issue with PUE metric. [1]

Answer

Inconsistent methodologies or ignores whole-system trade-offs/energy efficiency [1].

Question 14

Question

Explain energy-proportional computing and its importance for cloud data centers. [3]

Answer

Energy use proportional to utilization [1]; important as servers often underutilized (10-50%) [1] but consume significant idle power [1].

Cloud Sustainability

Question 15

Question

What is the difference between embodied and operational carbon emissions in cloud computing? [2]

Answer

Embodied emissions result from manufacturing, transporting, and disposing of hardware [1]; operational emissions come from electricity used during operation [1].

Question 16

Question

What is carbon-aware computing and how does it differ from energy efficiency? [3]

Answer

Carbon-aware computing schedules tasks based on electricity carbon intensity [1]. It differs from energy efficiency by focusing on when/where energy is used, rather than just reducing total consumption [1].

Cloud System Design

Question 17

Question

You are designing a cloud-based ML system with training and inference components. Why deploy the inference service at the edge rather than the cloud? [2]

Answer

Deploying at the edge reduces latency by processing data closer to users [1], avoiding round-trip delays to cloud servers [1].

Question 18

Question

Match scenarios to failover strategies (Active-Active/Active-Passive), justify.
a) Financial trading platform
b) Content management system

Answer

a) Active-Active, as it cannot tolerate downtime [2].
b) Active-Passive, acceptable downtime, cost-effective [2].

Question 19

Question

Explain the difference between availability and reliability in cloud systems. [2]

Answer

Availability refers to service readiness (uptime) [1]; reliability refers to system correctness and stability over time (MTBF) [1].

Question 20

Question

What does “five nines” availability mean, and how much downtime does it represent annually? [2]

Answer

“Five nines” (99.999%) represents approximately 5.26 minutes of downtime per year [2].

Modern Cloud Architectures

Question 21

Question

A financial tech company processes high daily transaction volumes. Choose and explain the best architecture:
a) Single high-memory server
b) Load-balanced servers, DB partitioning
c) Serverless functions, single DB
d) Monolithic app, local caching

Answer

b) Load-balanced servers with DB partitioning, offering scalability, performance, and reliability [2].

Question 22

Question

Differentiate horizontal and vertical scaling, give examples. [4]

Answer

Horizontal scaling adds machines (stateless web apps) [2]; vertical scaling upgrades resources on existing machines (databases benefiting from CPU/memory) [2].

Question 23

Question

What are microservices and two advantages over monoliths? [3]

Answer

Microservices: small independent services via APIs [1]. Advantages: independent deployment [1], improved fault isolation [1].

Question 24

Question

What is a service mesh, and what microservices problem does it solve? [2]

Answer

Service mesh manages service communication [1], addressing microservices monitoring, security, reliability issues outside application code [1].

Question 25

Question

Select appropriate service model (IaaS, PaaS, SaaS, FaaS) for scenarios:
a) Startup without infrastructure management
b) Company collaboration tools
c) Research simulations computing power
d) Web app developer avoiding server runtime

Answer

a) FaaS [1]
b) SaaS [1]
c) IaaS [1]
d) PaaS [1]

Extra

Question 26

What are the key disadvantages of the microservices architecture?
Answer
The key disadvantages of microservices include:
- Increased complexity (operational overhead, distributed debugging)
- Higher latency due to communication between services
- Microservice sprawl (potentially ballooning into hundreds or thousands of services)
- Operational overhead managing multiple CI/CD pipelines
- Interdependency chains can cause cascading failures, death spirals, and retry storms
- Failures in one service can trigger failures in dependent services
- Failure recovery could take longer than with monoliths
- Increased glueware requirements for monitoring, consistency, and coordination
Question 27

Explain the concept of "trap and emulate" in virtualization and when it can be used.
Answer
“Trap and emulate” is a virtualization technique where:
1. When a guest OS executes a privileged instruction, it causes a trap (exception)
2. Control is transferred to the VMM/hypervisor
3. The hypervisor emulates the behavior of the instruction
4. Execution returns to the guest OS
According to Popek and Goldberg’s theorem, this technique only works efficiently when all sensitive instructions are also privileged instructions. For x86 architectures, this doesn’t hold true as they contain critical instructions (sensitive but not privileged), which is why binary translation or hardware extensions are needed for efficient virtualization.
Question 28

What are the five essential characteristics of cloud computing according to the NIST definition?
Answer
The five essential characteristics of cloud computing according to NIST are:
1. On-demand self-service: Resources can be provisioned without human interaction
2. Broad network access: Services accessible via standard network mechanisms
3. Resource pooling: Provider resources are pooled and dynamically assigned to consumers
4. Rapid elasticity: Resources can be quickly provisioned and released to scale with demand
5. Measured service: Resource usage is monitored, controlled, and reported transparently
Question 29

Compare and contrast the three most common approaches to virtualization on x86 architectures.
Answer
The three approaches to x86 virtualization are:
1. Full Virtualization with Binary Translation:
  
  No modified guest OS needed
  
  No hardware support required
  
  Uses binary translation to handle critical instructions
  
  Uses shadow page tables for memory management
  
  Less efficient for I/O-intensive applications
2. OS-Assisted Virtualization (Paravirtualization):
  
  Requires modified guest OS
  
  No hardware support required
  
  Better performance through guest OS cooperation
  
  Example: Xen
  
  Limited compatibility with proprietary OSes
3. Hardware-Assisted Virtualization:
  
  No modified guest OS needed
  
  Requires hardware support (Intel VT-x, AMD-V)
  
  Uses new CPU modes and extended page tables
  
  Good performance for unmodified guests
  
  Specialized hardware required
Question 30

Explain the concept of energy proportionality in data centers and why it's important for cloud sustainability.
Answer
Energy proportionality refers to the goal that a computing system’s energy consumption should be proportional to its workload - ideally, a system’s energy consumption per operation would be independent of utilization level. In a perfectly energy-proportional system, a server at 50% utilization would consume exactly 50% of the power it consumes at 100% utilization.

This is important for cloud sustainability because:
- Data center servers are often not fully utilized (average utilization is typically 30-50%)
- Non-proportional systems waste energy when underutilized
- Energy-proportional systems can significantly reduce overall power consumption and carbon emissions
- Achieving energy proportionality requires optimizations at multiple levels: hardware design, system software, workload scheduling, and data center architecture
Question 31

What is autoscaling in Kubernetes and how does the Horizontal Pod Autoscaler calculate the desired number of replicas?
Answer
Autoscaling in Kubernetes automatically adjusts the number of pod replicas based on observed metrics. The Horizontal Pod Autoscaler (HPA) is a Kubernetes component that automatically scales the number of pods in a deployment or replica set.

The HPA calculates desired replicas using this formula:

desiredReplicas = ⌈currentReplicas * (currentMetricValue/desiredMetricValue)⌉

This assumes linear scaling between resource usage and replica count. The autoscaler also:
- Only scales if metrics are outside a tolerance (typically 0.1 or 10%)
- Scales to the highest number of desired replicas observed in a sliding window (5 minutes)
- Ignores pods being shut down in calculations
- Handles missing metrics by assuming 0 for scale-out and 1 for scale-in
- Assumes metric value of 0 for pods not yet ready
Question 32

Explain the carbon intensity concept and how it's used in carbon-aware computing.
Answer
Carbon intensity refers to the amount of equivalent CO₂ emissions released per unit of generated power (measured in gCO₂/kWh). It varies based on:
- Energy source: renewables have lower CI, fossil fuels have high CI
- Geographic region: different regions have different energy mixes
- Time of day/year: CI varies with changes in supply and demand
In carbon-aware computing, carbon intensity is used to:
- Make time-shifting decisions (scheduling workloads during low-carbon periods)
- Make location-shifting decisions (running workloads in regions with cleaner energy)
- Implement carbon-aware load balancing between data centers
- Evaluate the environmental impact of computing operations
There are two types of carbon intensity signals:
- Average CI: the overall emissions of the electricity mix (useful for reporting)
- Marginal CI: emissions from the next unit of electricity (better for real-time decisions)
Question 33

What are shadow page tables in virtualization and why are they important?
Answer
Shadow page tables are a memory management technique used in full virtualization to handle virtual to physical address translation efficiently. They work as follows:
1. The guest OS maintains its own page tables mapping virtual to “physical” addresses (which are actually still virtual from the host perspective)
2. The VMM/hypervisor maintains shadow page tables that map guest virtual addresses directly to host physical addresses
3. The shadow page tables are what the hardware MMU actually uses
4. When the guest modifies its page tables, these changes trap to the hypervisor which updates the shadow page tables accordingly
Shadow page tables are important because they:
- Avoid the performance penalty of nested address translation
- Allow the TLB to cache translations effectively
- Enable the guest OS to believe it has direct control over memory mapping
- Were essential before hardware virtualization extensions introduced nested/extended page tables
Question 34

Describe the blue/green deployment strategy and its advantages.
Answer
Blue/green deployment is a continuous deployment strategy using two identical production environments:
1. At any time, one environment (blue or green) is active and receiving all production traffic
2. New versions are deployed to the inactive environment
3. After testing the new version in the inactive environment, traffic is switched over
4. The old active environment becomes inactive, ready for the next deployment
Advantages include:
- Zero downtime during deployments
- Simple and fast rollback capability (just switch traffic back)
- Reduced risk as the new version is fully tested before receiving traffic
- Complete testing in a production-identical environment
- No in-place upgrades that could lead to subtle configuration issues
- Deployment and release are decoupled (deploy first, release later)
Question 35

What is Jevons' Paradox and how does it apply to energy efficiency in cloud computing?
Answer
Jevons’ Paradox states that as technology makes resource use more efficient, the demand for that resource tends to increase, potentially leading to higher overall resource consumption rather than savings. Originally observed by William Stanley Jevons in 1865 for coal consumption after the introduction of more efficient steam engines.

In cloud computing, this manifests as:
- More efficient servers lead to lower costs per computation
- Lower costs drive increased demand for cloud services
- The total energy consumption and carbon footprint continue to grow despite efficiency improvements
- Datacenters become more energy-efficient (better PUE) but total energy usage increases
- More efficient infrastructure enables more demanding applications (AI, ML, etc.)
This paradox highlights that efficiency alone is insufficient for sustainability - we also need absolute reductions in resource consumption and carbon-aware approaches.
Question 36

What is Infrastructure as Code (IaC) and how does it address the challenges of "configuration drift" and "snowflake servers"?
Answer
Infrastructure as Code (IaC) is the practice of managing and provisioning infrastructure through machine-readable definition files rather than manual processes. It treats infrastructure configuration like software code that can be versioned, tested, and automated.

IaC addresses:
1. Configuration Drift: When configurations change over time without documentation
  
  IaC provides a single source of truth for infrastructure configuration
  
  Changes must be made to the code, creating an audit trail
  
  Automated deployment ensures consistency between environments
2. Snowflake Servers: Servers with unique, unreproducible configurations
  
  IaC enables reproducible environment creation from code
  
  Instead of modifying existing servers, new ones are created from definitions
  
  “Immutable infrastructure” approach: rebuild rather than update
  
  Easy to mirror production environments for testing
  
  Clear separation between deliberate configuration and defaults
Question 37

Explain the concept of Continuous Integration/Continuous Delivery (CI/CD) in cloud environments and list its key practices.
Answer
Continuous Integration/Continuous Delivery (CI/CD) is a set of practices for automating software delivery processes. In cloud environments, CI/CD involves:

Continuous Integration:
- One common versioned code repository
- Build automation with short build times
- Self-testing builds
- Regular code commits
- Building every commit (often on a CI server)
- Making test results visible to everyone
- Maintaining similar test and production environments
Continuous Delivery/Deployment:
- Automated release process
- Fast and reproducible software releases
- Short cycle time (time from code change to production)
- Automated testing (unit, integration, acceptance)
- Deployment to staging environments
- Monitoring and smoke tests
The key difference between Continuous Delivery and Continuous Deployment is that Delivery requires manual approval before production deployment, while Deployment automatically pushes changes to production after passing tests.
Question 38

What are the differences between Average Carbon Intensity and Marginal Carbon Intensity when measuring the environmental impact of computing?
Answer
Average Carbon Intensity:
- Measures the overall carbon emissions of the electricity mix
- Calculated as total emissions divided by total energy produced
- Useful for general emissions reporting and long-term analysis
- Doesn’t reflect real-time fluctuations in energy sources
- Good for annual sustainability reporting and policy planning
- Can obscure the impact of specific energy consumption changes
Marginal Carbon Intensity:
- Measures the emissions from the next unit of electricity consumed
- Represents what happens if demand increases by one unit
- Better for real-time decision making and load-shifting strategies
- More accurately reflects the immediate impact of energy decisions
- Data availability can vary across regions and time periods
- Can be complex to predict accurately
- Not ideal for long-term infrastructure analysis
For carbon-aware computing, marginal intensity provides better guidance for immediate operational decisions like when to run workloads.
Question 39

What are the three main approaches to achieving fault tolerance in distributed systems, and how do they differ?
Answer
Three main approaches to fault tolerance in distributed systems:
1. Error Detection:
  
  Monitoring systems collect metrics like CPU, memory, and network usage
  
  Heartbeats provide basic indication of system availability
  
  Telemetry analyzes metrics across servers to identify issues
  
  Circuit breaker pattern detects and prevents cascading failures
  
  Error-correcting code (ECC) memory detects and corrects bit errors
2. Redundancy/Replication:
  
  Hardware redundancy (servers, storage, network equipment)
  
  Geographic redundancy (distributing across regions)
  
  Data replication to ensure availability despite failures
  
  Component replication (power supplies, cooling systems)
  
  Affects availability according to 1-pⁿ (where p = probability of individual failure)
3. Failover Strategies:
  
  Active-Passive: Primary system handles all workload with idle standby
  
  Active-Active: Multiple systems simultaneously handle workload
  
  Cold/Warm/Hot standby with different recovery time objectives
  
  State management and consistency mechanisms during failover
  
  Load balancers to redirect traffic during failures
Question 40

Compare and contrast the scaling approaches for stateless vs. stateful components in cloud architectures.
Answer
Scaling Stateless Components:
- Maintains no internal state beyond a request
- Easy to horizontally scale by adding more instances
- Load balancing methods:
  
  DNS-level balancing (simple but slow to react to failures)
  
  IP-level balancing (faster response, basic health checks)
  
  Application-level balancing (granular control, content-based routing)
- Instances can be added/removed without concern for state
- Examples: web servers with static content, DNS servers
Scaling Stateful Components:
- Maintains state beyond a request
- More complex to scale horizontally
- Requires data partitioning strategies:
  
  Per tenant (isolating different clients)
  
  Horizontal/Sharding (splitting table by rows across servers)
  
  Vertical (splitting table by columns)
- Partitioning distribution methods:
  
  Range partitioning (efficient for range queries, poor load balancing)
  
  Hash partitioning (good load balancing, inefficient for range queries)
- Requires handling cross-partition queries and maintaining consistency
- Examples: databases, stateful web servers, mail servers
Question 41

What are the key components of Kubernetes and how do they work together to manage containerized applications?
Answer
Key Kubernetes components:
1. Control Plane Components (Master):
  
  API Server: Entry point for all REST commands
  
  etcd: Reliable distributed key-value store for cluster state
  
  Scheduler: Places pods based on resource requirements and constraints
  
  Controllers: Manage state (replication, endpoints, nodes, service accounts)
2. Node Components:
  
  kubelet: Agent that communicates with the Master
  
  kube-proxy: Makes services available on each node
  
  Container Runtime: Software executing containers (Docker, containerd)
3. Logical Resources:
  
  Pods: Groups of containers scheduled together on the same node
  
  Deployments: Manage stateless applications
  
  StatefulSets: Manage stateful applications
  
  Services: Expose functionality of pods to the cluster or externally
  
  Horizontal Pod Autoscaler: Automatically scales pods based on metrics
These components work together to:
- Deploy and manage containerized applications
- Schedule containers based on resource requirements
- Monitor container health and restart failed containers
- Scale applications based on demand
- Provide service discovery and load balancing
- Update applications without downtime
Question 42

Describe the lifecycle emissions of datacenter hardware and explain why operational emissions might not be the only environmental concern.
Answer
Datacenter hardware lifecycle emissions:
1. Embodied Emissions (23% of total):
  
  Raw material extraction and processing
  
  Manufacturing of components (CPUs, memory, storage)
  
  Assembly of hardware
  
  Transportation to datacenter
  
  Can be up to 50% of emissions for consumer devices
2. Operational Emissions (70% of total):
  
  Electricity for powering computing equipment
  
  Cooling systems energy use
  
  Network infrastructure power consumption
  
  Lighting and auxiliary systems
3. End-of-life Emissions (5-7% of total):
  
  Recycling processes
  
  E-waste management
  
  Disposal of non-recyclable components
Beyond operational emissions, other environmental concerns include:
- Water usage for cooling (particularly in water-scarce regions)
- Land use for datacenter construction
- Resource depletion of rare earth minerals and metals
- Hazardous materials in electronic components
- Short hardware lifecycles increasing embodied carbon impact
A holistic approach to datacenter sustainability requires addressing the full lifecycle impact, not just operational energy efficiency.
Question 43

Explain the CAP theorem and its implications for distributed database design in cloud environments.
Answer
The CAP theorem (Brewer’s theorem) states that a distributed database system can only provide two of the following three guarantees simultaneously:
- Consistency: All nodes see the same data at the same time
- Availability: Every request receives a response (success or failure)
- Partition tolerance: System continues to operate despite network partitions
In cloud environments, network partitions are unavoidable, so systems must choose between:
- CP systems: Sacrifice availability during partitions to maintain consistency
  
  Examples: Google Spanner, HBase, MongoDB (default config)
  
  Good for banking, financial systems where consistency is critical
- AP systems: Sacrifice consistency during partitions to maintain availability
  
  Examples: Amazon DynamoDB, Cassandra, CouchDB
  
  Good for social media, content delivery where availability matters most
Modern distributed databases often implement various consistency models beyond strict consistency:
- Strong consistency: All reads reflect the latest write
- Eventual consistency: All updates propagate eventually
- Causal consistency: Related operations appear in the same order to all observers
- Read-your-writes consistency: A user always sees their own updates
Cloud architects must understand these tradeoffs to select appropriate storage solutions based on application requirements.
Question 44

What are the primary differences between private, public, community, and hybrid cloud deployment models?
Answer
Private Cloud:
- Used by a single organization
- Owned by that organization or a third party
- Located on-premise or off-premise
- Advantages: Control, security, compliance, customization
- Disadvantages: Higher costs, requires IT expertise
Public Cloud:
- Available to the general public
- Owned by cloud service providers (AWS, Azure, GCP)
- Located in provider’s datacenters
- Advantages: Cost-effective, scalable, minimal management
- Disadvantages: Less control, potential security concerns
Community Cloud:
- Shared by organizations with common concerns (e.g., compliance, security)
- Owned by participating organizations or third party
- Located on-premise or off-premise
- Advantages: Cost sharing, compliance, common requirements
- Disadvantages: Limited resources compared to public cloud
Hybrid Cloud:
- Composition of two or more cloud models (private, community, public)
- Bound together by standardized technology for portability
- Advantages: Flexibility, workload optimization, cost balancing
- Disadvantages: Complexity, integration challenges, skill requirements
Multi-cloud (a related concept):
- Using services from multiple public cloud providers
- Advantages: Avoiding vendor lock-in, leveraging best services
- Disadvantages: Management complexity, potential data transfer costs
Question 45

Describe the Power Usage Effectiveness (PUE) metric, its limitations, and alternative metrics for measuring datacenter efficiency.
Answer
Power Usage Effectiveness (PUE):
- Definition: Total facility energy / IT equipment energy
- Industry average: ~1.58 (2022)
- Best practice target: 1.2 or less
- Perfect PUE would be 1.0 (all energy used by computing equipment)
Limitations of PUE:
- Inconsistent measurement methodologies
- Doesn’t account for energy sources (renewable vs fossil fuels)
- Doesn’t measure IT equipment efficiency
- Doesn’t capture whole system tradeoffs (e.g., heat reuse)
- Can be manipulated by changing measurement timing or boundaries
- Doesn’t account for climate differences
Alternative metrics:
- Carbon Usage Effectiveness (CUE): Total CO₂ emissions / IT equipment energy
- Water Usage Effectiveness (WUE): Datacenter water consumption / IT equipment energy
- Energy Reuse Effectiveness (ERE): Measures how much energy is reused outside the datacenter
- Green Energy Coefficient (GEC): Percentage of renewable energy used
- Performance per Watt: Computing output relative to power consumption
- Total Cost of Ownership (TCO): Financial metric incorporating efficiency
Comprehensive assessment requires multiple metrics to capture overall sustainability impact.
Question 46

Explain the concept of cross-cloud computing and the different approaches to implementing it.
Answer
Cross-cloud computing refers to operating seamlessly across multiple cloud environments. It’s implemented through several approaches:
1. Hybrid Clouds:
  
  Combination of private and public clouds
  
  Connected via dedicated links or VPNs
  
  Allows workload mobility between environments
  
  Often involves some hardwiring between environments
2. Multi-clouds:
  
  Using services from multiple public cloud providers
  
  Implemented with translation libraries or common programming models
  
  Examples: Terraform, Ansible Cloud Modules, OpenTofu, Pulumi
  
  May lose some provider-specific features due to abstraction
3. Meta-clouds:
  
  Using a broker layer to abstract multiple clouds
  
  Broker makes decisions about resource allocation
  
  Reduces control but simplifies management
  
  Many commercial and academic proposals exist
4. Federated Clouds:
  
  Establishing common APIs between cloud providers
  
  Requires standardization of interfaces
  
  Most challenging to implement but potentially most seamless
Motivations for cross-cloud computing:
- Avoiding vendor lock-in
- Increasing resilience against provider outages
- Leveraging different providers’ strengths
- Meeting regulatory requirements (data sovereignty)
- Geographic coverage optimization
Each approach involves tradeoffs between flexibility, complexity, and provider-specific functionality.
Question 47

How do serverless/Function-as-a-Service (FaaS) platforms work, and what are their advantages and limitations?
Answer
Serverless/FaaS Platforms:

Working mechanism:
- Functions are deployed as standalone code units
- Execution triggered by events (HTTP requests, database changes, etc.)
- Provider dynamically manages resources and scaling
- Environment is ephemeral; no persistent local storage
- Cold starts occur when new container instances are initialized
- Execution is time-limited (typically 5-15 minutes maximum)
Advantages:
- Lower costs through precise usage-based billing
- No servers to manage, reducing operational complexity
- Automatic scaling without configuration
- Fast deployment and time-to-market
- Focus on business logic rather than infrastructure
- “No idle resources” model
Limitations:
- Cold start latency impacts response times
- Vendor lock-in due to platform-specific services and APIs
- Complex state management (stateless execution model)
- Memory and execution time constraints
- Debugging and monitoring challenges
- Limited local testing capabilities
- Potential higher costs for constant-load applications
Examples include AWS Lambda, Azure Functions, Google Cloud Functions, Cloudflare Workers, and IBM Cloud Functions.
Cloud Resource Management

Question 48

Question

Answer

Binary translation uses shadow page tables where the hypervisor maintains duplicate tables that map guest virtual addresses directly to host physical addresses [1]. The hypervisor must trap and emulate all page table operations, creating significant overhead [1].

Hardware-assisted virtualization uses Extended/Nested Page Tables (EPT/NPT) that add a hardware layer for two-level address translation (guest virtual → guest physical → host physical) [1]. This eliminates the need for shadow page tables and reduces VMM interventions [1].

Hardware-assisted virtualization is more efficient because it reduces VMM traps for memory operations, provides hardware TLB support for nested translations, and eliminates the memory overhead of maintaining shadow page tables [1].

Question 49

Question
Answer
Docker uses namespaces to provide process isolation by creating separate views of system resources [1]. Examples include:
- PID namespace: Isolates process IDs (container processes can’t see host processes)
- Network namespace: Provides separate network interfaces, routing tables, and firewall rules
- Mount namespace: Isolates filesystem mount points
- UTS namespace: Isolates hostname and domain name [1]
Docker uses cgroups to control resource allocation and impose limits [1]. Examples include:
- CPU cgroups: Limit CPU usage percentage
- Memory cgroups: Restrict memory consumption and swap usage
- Block I/O cgroups: Manage disk I/O priorities and limits
- Device cgroups: Control access to specific devices [1]
Question 50

Question
Answer
Software-Defined Networking separates the control plane (network intelligence) from the data plane (packet forwarding), centralizing network configuration and management through software controllers [1]. This separation enables programmable network behavior through standardized interfaces like OpenFlow [1].

SDN benefits cloud infrastructure by:
- Enabling dynamic network configuration to support rapid VM/container provisioning and migration
- Providing network virtualization for multi-tenant isolation
- Allowing policy-based routing and traffic engineering for optimal resource utilization [1]
Question 51

Question
Answer
Declarative IaC defines the desired end state of infrastructure without specifying how to achieve it [1]. Example: Terraform/CloudFormation configurations that specify resources and their properties. The tool determines the required actions to reach that state [1].

Imperative IaC uses scripts that explicitly define the sequence of commands needed to create infrastructure [1]. Example: Shell scripts with explicit AWS CLI/Azure CLI commands that create resources in a specific order [1].

Trade-offs:
- Declarative: Better idempotency and state management, more self-documenting, handles dependencies automatically, but less flexibility for complex workflows [1]
- Imperative: More control over execution sequence, familiar to developers, easier debugging, but more error-prone and harder to maintain as infrastructure grows [1]
Question 52

Question
Answer
Canary deployments gradually roll out changes to a small subset of users before full deployment [1]. The new version is deployed to a small percentage of production servers/users, allowing monitoring of its behavior and performance in real production conditions with limited impact [1].

Advantages over blue/green:
- Reduced risk by limiting exposure of new version to a small percentage of users [1]
- More granular rollout control (can be increased incrementally)
- Lower resource requirements (don’t need full duplicate environment)
- Better for detecting performance issues that only appear under real load patterns [1]
Most beneficial for:
- High-traffic applications where full-scale errors would affect many users
- Applications with unpredictable user behavior patterns
- Services with complex dependencies that are difficult to fully test in staging
- Applications where performance metrics are critical acceptance criteria [1]
Scalable and Sustainable Architectures

Question 53
You are designing a cloud architecture for a financial application that processes transactions. The application needs to:
- Handle high volume of transactions
- Maintain strict data consistency
- Scale dynamically based on load
- Maintain high availability
Choose an appropriate architecture pattern and justify your choice. Discuss any potential limitations and how you might address them. [6]
Answer
A microservices architecture with CQRS (Command Query Responsibility Segregation) pattern would be appropriate [1]. This separates read and write operations, allowing each to be optimized and scaled independently [1].

Architecture components:
- API Gateway for request routing and authentication
- Command service for write operations with synchronous processing
- Event sourcing to maintain transaction history and audit trail
- Read replicas optimized for query performance
- Distributed caching for frequent queries [1]
Justification:
- High transaction volume: Horizontal scaling of microservices
- Strict consistency: Synchronous processing for critical write operations
- Dynamic scaling: Independent scaling of read/write components
- High availability: Regional replication and stateless services [1]
Limitations and mitigations:
- Complexity: Implement robust monitoring and service mesh for observability
- Eventual consistency for reads: Use versioning or timestamps to detect stale data
- Distributed transaction management: Implement saga pattern for transactions spanning multiple services [1]
Database approach:
- Primary database (e.g., PostgreSQL) for write operations with ACID guarantees
- Read replicas with appropriate isolation levels for query performance [1]
Question 54

Question
Answer
Data gravity refers to the tendency of applications, services, and computing resources to be attracted to and cluster around large data repositories [1]. As data accumulates, it becomes increasingly difficult to move due to transfer costs, bandwidth limitations, and latency considerations [1].

In multi-cloud scenarios, data gravity influences:
- Cloud provider selection based on where critical data already resides
- Data synchronization strategies between clouds
- Application placement to minimize data transfer costs [1]
In edge computing, data gravity drives:
- Local processing of data-intensive workloads to avoid transferring raw data
- Intelligent data filtering and aggregation before transmission to the cloud
- Distributed database designs that keep data close to where it’s generated and consumed [1]
Question 55

Question
Answer
DNS-level load balancing:
- Works by returning different IP addresses when clients resolve domain names
- Advantages: Simple implementation, works across regions, no additional hardware
- Disadvantages: Slow propagation due to DNS caching, limited health checking
- Best scenario: Global distribution of static content across multiple regions where rapid failover isn’t critical [2]
IP-level (L4) load balancing:
- Works at transport layer (TCP/UDP), routing traffic based on IP address and port
- Advantages: High performance, handles millions of connections, simple failure detection
- Disadvantages: Limited routing intelligence, no content-based decisions
- Best scenario: High-throughput applications like video streaming or large file transfers where connection volume is high but routing logic is simple [2]
Application-level (L7) load balancing:
- Works at application layer, routing based on HTTP headers, cookies, URL paths
- Advantages: Content-based routing, SSL termination, advanced session persistence
- Disadvantages: Higher computational overhead, more complex configuration
- Best scenario: Microservices architecture where requests need routing to specific services based on URL paths or API endpoints [2]
Question 56

Question
Answer
The shared responsibility model defines the division of security responsibilities between the cloud provider and the customer [1]. The general principle is that providers are responsible for security “of” the cloud (infrastructure) while customers are responsible for security “in” the cloud (data, applications) [1].

Responsibility division by service model:

IaaS:
- Provider: Physical security, host virtualization, network infrastructure
- Customer: Operating system, applications, data, identity management, access controls [1]
PaaS:
- Provider: Everything in IaaS plus operating system, middleware, runtime
- Customer: Applications, data, identity management, access policies [1]
SaaS:
- Provider: Nearly everything (infrastructure through application)
- Customer: Data classification, user access management, compliance requirements [1]
As you move from IaaS to SaaS, the provider assumes more responsibility, but the customer always retains responsibility for their data and user access [1].
Question 57

Question
Answer
Step 1: Assessment and planning
- Analyze the monolith to identify bounded contexts and potential service boundaries
- Map dependencies between components
- Prioritize services for migration based on business value and complexity [1]
Step 2: Create a cloud foundation
- Establish cloud infrastructure using Infrastructure-as-Code
- Implement CI/CD pipelines
- Set up monitoring and observability tools [1]
Step 3: Implement the strangler pattern
- Create an API gateway/facade in front of the monolith
- Redirect specific functionality to new microservices
- Gradually replace monolith components while maintaining functionality [1]
Step 4: Extract and migrate services incrementally
- Begin with stateless, non-critical services
- Refactor one bounded context at a time
- Use feature flags to control functionality exposure
- Run old and new implementations in parallel with A/B testing [1]
Step 5: Data migration and management
- Implement data access patterns (CQRS, event sourcing)
- Use change data capture for synchronization during transition
- Gradually shift to service-specific databases [1]
Cloud Sustainability

Question 58

Question
Answer
Key challenges in measuring application-level carbon footprints:
- Limited visibility into physical infrastructure and its energy consumption
- Multi-tenancy obscures resource attribution between workloads
- Varying carbon intensity across regions and time
- Complex supply chain emissions for cloud services [2]
Emerging approaches:
1. Power API and energy estimation models:
  
  Correlate application metrics (CPU, memory, I/O) with energy consumption
  
  Create mathematical models to estimate energy use from observable metrics
  
  Examples: Cloud Carbon Footprint, Green Algorithms [1]
2. FinOps-integrated carbon accounting:
  
  Leverage billing data as a proxy for resource utilization
  
  Apply emission factors based on region and service type
  
  Incorporate embodied carbon allocation over hardware lifecycle
  
  Examples: Cloud Carbon Footprint, Microsoft Sustainability Calculator [1]
Question 59

Question
Answer
Jevons paradox states that technological improvements in resource efficiency can lead to increased consumption of that resource rather than decreased use [1]. In cloud computing, more efficient servers and data centers lower the cost of computing, which increases demand and can result in greater total energy consumption despite per-unit efficiency gains [1].

Examples in cloud computing:
- More efficient servers enable more demanding applications (AI/ML)
- Lower costs increase cloud adoption and workload migration
- Higher efficiency leads to larger data centers [1]
Strategies to counteract:
- Absolute carbon caps and internal carbon pricing
- Direct renewable energy investments tied to expansion
- Focus on workload optimization and eliminating idle resources
- Education about total consumption impact rather than just efficiency metrics [1]
Question 60

Question
Answer
Carbon offsetting:
- Purchasing credits to compensate for emissions (renewable energy certificates, carbon removal projects)
- Advantages: Immediate impact, addresses emissions that cannot be eliminated
- Limitations: Doesn’t reduce actual emissions, offset quality varies widely, vulnerable to greenwashing, doesn’t drive system-level change [2]
Carbon reduction:
- Direct strategies to lower emissions (renewable energy, efficiency improvements, hardware lifecycle extension)
- Advantages: Real emissions reduction, drives innovation, provides competitive advantage
- Limitations: Higher initial costs, longer implementation timeline, technical challenges, may hit diminishing returns [2]
Key differences:
- Offsetting maintains status quo with compensation; reduction changes operational practices
- Offsetting is often cheaper short-term; reduction typically more cost-effective long-term
- Reduction addresses root causes while offsetting addresses symptoms
- Comprehensive strategy requires both approaches, with reduction prioritized [1]
Question 61
A global company operates cloud workloads across multiple regions. Outline a carbon-aware scheduling strategy that would optimize for:
1. Lowest carbon emissions
2. Lowest latency for users
3. Regulatory compliance for data sovereignty
Explain the trade-offs involved and how you would prioritize these requirements. [6]
Answer
Carbon-aware scheduling strategy:
1. Workload classification:
  
  Time-critical (real-time user interaction)
  
  Time-flexible (batch processing, analytics)
  
  Data-sensitive (contains regulated information) [1]
2. Region mapping:
  
  Create a matrix of regions with their carbon intensity, latency to user bases, and regulatory compliance status
  
  Use both average and marginal carbon intensity metrics
  
  Update this map regularly with real-time grid data [1]
3. Decision framework:
  
  For data-sensitive workloads: First filter by compliant regions, then optimize for carbon within latency constraints
  
  For time-critical workloads: Ensure latency requirements are met, then choose lowest-carbon region among candidates
  
  For time-flexible workloads: Implement temporal and spatial shifting based on carbon intensity forecasts [2]
4. Implementation mechanism:
  
  Use container orchestration (Kubernetes) with custom schedulers
  
  Implement carbon-aware autoscaling policies
  
  Create carbon budgets per service/application [1]
Trade-offs and prioritization:
- Data sovereignty is a non-negotiable legal requirement and must be prioritized first
- Latency vs. carbon involves business decisions - critical user-facing services prioritize latency
- For non-critical workloads, carbon can be prioritized over perfect latency
- Consider using carbon budgets to make trade-offs explicit and measurable [1]
Question 62

Question
Answer
Specialized hardware impacts cloud sustainability through:
1. Arm processors:
  
  Lower power consumption per computational unit
  
  Better performance-per-watt for web servers and containerized applications
  
  Example workload benefit: Microservices with consistent moderate loads [1]
2. TPUs (Tensor Processing Units):
  
  Optimized for machine learning matrix operations
  
  30-80% better energy efficiency than GPUs for ML workloads
  
  Example workload benefit: Large language model inference and training [1]
3. FPGAs (Field-Programmable Gate Arrays):
  
  Custom hardware acceleration for specific algorithms
  
  Significant efficiency gains for specialized repetitive tasks
  
  Example workload benefit: Video transcoding, cryptography, and genomic sequencing [1]
4. Domain-specific accelerators:
  
  Hardware designed for specific functions (networking, storage, security)
  
  Offloads processing from general-purpose CPUs
  
  Example workload benefit: Network packet processing, encryption/decryption [1]
Environmental impacts:
- Reduced energy consumption through workload-specific optimization
- Potentially smaller data center footprints through higher compute density
- Challenge: Specialized hardware may have higher embodied carbon, requiring longer use to achieve net benefits [1]
Link to original
Cloud System Quizzes
This note contains a collection of weekly quizzes from the Cloud Systems course, organized by topic. These self-assessment questions are useful for checking your understanding and preparing for the exam.

Week 1: Cloud Computing Introduction
1. Which statement correctly differentiates between clusters, clouds, and grids?
  - ✓ Clusters locally connect computers to form a single system, grids integrate widely distributed systems for common tasks, and clouds offer scalable computing resources as a service.
  - ✗ Clusters use loosely connected computers that are used together to solve large problems, while grids consist of computers connected in a high-performance local network, and clouds provide on-demand resources over the Internet.
  - ✗ Clouds provide centralized resources on top of grid computing infrastructure, with better scalability than clusters.
2. Which correctly matches the cloud service model with its primary function?
  - ✗ IaaS: Provides a framework for application components and development tools for coding.
  - ✗ PaaS: Offers virtual machines, storage, and networking resources for full user control.
  - ✓ SaaS: Delivers complete software solutions accessible via the Internet (e.g., as Web applications).
3. Which is NOT one of the five essential characteristics of cloud computing as defined by NIST?
  - ✗ On-Demand Self-Service
  - ✗ Broad Network Access
  - ✓ High Availability (guaranteed uptime and limited service interruptions)
  - ✗ Resource Pooling
  - ✗ Rapid Elasticity
4. Which best describes virtualization?
  - ✗ Virtualization is the creation of simulated environments that fully replace physical hardware.
  - ✓ Virtualization allows multiple virtual instances to run on a single physical hardware resource, abstracting and sharing resources.
  - ✗ Virtualization refers to using cloud-based services to dynamically allocate physical servers to customers.
5. Which statement correctly describes the use cases of VMs versus containers?
  - ✓ VMs allow running applications requiring a different operating system, while containers are more suitable for shipping software components with their dependencies.
  - ✗ VMs provide faster startup times and are ideal for high-performance applications, while containers are ideal only for stateless applications.
  - ✗ It is typically possible to both run more VM instances and to store more VM images than container instances and images on the same physical host machine.
Week 2: System Virtualization
1. Which statement correctly differentiates between Type 1 and Type 2 hypervisors?
  - ✓ Type 1 hypervisors run on bare-metal hardware, minimizing overhead (ideal for clouds), whereas Type 2 hypervisors run on top of an operating system, enabling virtualization on personal computing environments.
  - ✗ Both Type 1 and Type 2 hypervisors run directly on hardware, but Type 2 hypervisors have more complex architectures, making them less efficient.
  - ✗ Type 2 hypervisors run on top of a host operating system, ideal for running containerized applications, whereas Type 1 hypervisors run directly on bare-metal hardware, ideal for running virtual machines.
2. Which computing resources are typically virtualized to allow multiple VMs to operate on a single physical host?
  - ✗ Only CPUs and memory are virtualized, as network and storage devices cannot be shared.
  - ✓ Access to CPUs, memory, I/O devices, and storage devices all need to be managed to create isolated environments for VMs on a shared host system.
  - ✗ Typically, only specific CPU instructions need to be virtualized, as memory, I/O devices, and storage devices are accessed through instructions.
3. Which statement accurately describes Popek and Goldberg’s theorem for efficient virtualization and its implications for x86 processors?
  - ✗ The theorem states that efficient virtualization can only be achieved if there are no sensitive instructions.
  - ✓ The theorem requires that all privileged and sensitive instructions trap to the hypervisor when executed in user mode. x86 processors did not meet this requirement, requiring binary re-writing.
  - ✗ The theorem specifies that all privileged instructions can only execute in kernel mode, and x86 processors implement this by using two of four rings.
4. In the context of full virtualization, what is true for shadow page tables?
  - ✓ Shadow page tables, which map virtual memory addresses as used by the guest OS directly to the actual physical memory of the host, are maintained by the hypervisor and used by hardware.
  - ✗ Shadow page tables are maintained by the guest operating system, so it can manage and use physical memory without hypervisor involvement.
  - ✗ Shadow page tables copy the memory address translation entries of guest operating systems to make VMs fault-tolerant and easy-to-migrate.
5. In Xen, how is I/O virtualized to allow guest VMs isolated access to physical devices?
  - ✗ Each guest VM includes its own device drivers, which communicate directly with the physical hardware.
  - ✗ In Xen, all device drivers are installed in the hypervisor, which provides virtual devices.
  - ✓ Xen uses a split-driver model where lightweight virtual device drivers in guest VMs communicate with Xen through hypercalls and events, and Xen, in turn, uses the actual physical device drivers in the privileged domain (Dom0).
Week 3: Containers
1. Which Linux kernel feature is NOT used for OS-level virtualization?
  - ✗ chroot, which restricts a process’s file system access to a specific directory.
  - ✗ namespaces, which limits access to system resources such as network devices, mount points, and processes.
  - ✓ cron, which schedules recurring tasks at specific times or intervals.
  - ✗ cgroups, which limit access to compute resources such as CPU and memory.
2. Which is NOT a feature that Docker provides on top of a container library?
  - ✗ Image distribution – Docker provides tools to pull and push container images from and to public/private registries using a hierarchical image format.
  - ✗ Build tools – Docker includes tooling to create images from textual descriptions that modify a base image.
  - ✗ Orchestration – Docker facilitates basic orchestration (e.g., via docker-compose) of multi-container applications.
  - ✓ Fault tolerance and auto-scaling – Docker automatically restarts and replicates containers as required.
3. When comparing the performance overhead of VMs and containers, which statement is true?
  - ✗ Since containers share the same operating system kernel, while VMs include their own guest operating system, containers have a higher CPU overhead.
  - ✓ Using hardware devices through virtual devices and virtual device drivers can lead to higher latencies and latency variations in VMs.
  - ✗ Containers typically experience lower throughput for sequential memory operations than VMs due to the lack of direct access to hardware resources.
  - ✗ A set of similar VMs on one host has a smaller disk footprint than several similar container images because VM images are hierarchical, and layers can be shared among similar VMs.
4. Which is NOT an advantage of microservice architectures?
  - ✗ Microservices enable independent deployment and scaling of services.
  - ✗ Microservices can improve resilience by isolating failures within individual services.
  - ✗ Microservices allow flexibility in choosing different technologies for different components.
  - ✓ Microservices increase cluster resource utilization by running more services simultaneously.
5. Which statement is true for features of Kubernetes?
  - ✗ Using a load balancer in Kubernetes spreads the containers of a pod over different nodes.
  - ✓ Kubernetes probes nodes and pods in a configurable interval, noticing failures with a delay as large as the interval.
  - ✗ Kubernetes’ horizontal pod autoscaler will optimize the CPU limit of containers in a pod towards a user-provided CPU utilization target.
  - ✗ The Kubernetes scheduler prioritizes CPU utilization over memory and disk utilization.
Week 4: Cloud Infrastructure Management
1. Why do cloud operating systems like OpenStack NOT typically use different physical hosts for different host roles?
  - ✗ To ensure fault tolerance by isolating critical services so that failure in one component does not affect other components.
  - ✓ To reduce costs by spreading out storage, compute, and networking across clusters.
  - ✗ To prevent interference between guest VMs on compute and the systems controllers on the controller hosts.
  - ✗ To enable the use of specialized hardware for different functions.
2. Which statement on virtual networking is NOT correct?
  - ✗ Virtual switches allow VMs on the same host to communicate, functioning similarly to physical network switches.
  - ✓ Virtual Network Functions (VNFs) refer to software-based network appliances like firewalls, load balancers, and routers running on physical infrastructure to perform traditional networking tasks.
  - ✗ Virtual networks are logically isolated network environments created in virtualized environments.
  - ✗ Software-Defined Networks (SDNs) revolve around the programmability of network configurations.
3. Which is NOT a correct step or feature of live VM migration in Xen?
  - ✗ Xen uses an iterative pre-copy strategy to migrate memory pages, with the last dirty pages being transferred after the VM is paused.
  - ✗ Xen sends unsolicited ARP requests to invalidate IP-to-MAC mappings, allowing the destination VM to respond to new ARP requests.
  - ✗ Xen utilizes network migration and remote virtual storage to ensure continuous access to volumes after.
  - ✓ Xen synchronizes the source and destination VMs by executing the same CPU instructions in real time during live migration.
4. Which issue is NOT effectively addressed by adopting the Infrastructure-as-Code paradigm?
  - ✗ Configuration drift: IaC can help avoid undocumented inconsistencies in configuration.
  - ✗ Server sprawl: IaC can effectively address the uncontrolled creation and proliferation of redundant servers.
  - ✓ Resource underprovisioning: IaC can mitigate resource bottlenecks by adequately scaling the infrastructure for the code.
  - ✗ Snowflake servers: IaC can help eliminate unique, difficult-to-replicate servers.
5. Which type of testing is NOT commonly performed during canary deployments?
  - ✗ Traffic Analysis: Monitoring user behavior, latency, and error rates of the new deployment under real-world traffic.
  - ✗ Performance Monitoring: Measuring system responsiveness, resource utilization, and throughput during the partial release.
  - ✗ A/B Testing: Comparing the canary version’s performance and user engagement metrics side-by-side against the previous version’s metrics.
  - ✓ Chaos Testing: Randomly introducing controlled failures to the system to evaluate the new deployment’s resiliency.
Week 5: Cloud Sustainability
1. Which is NOT a correct category of computer system lifecycle emissions?
  - ✗ Embodied emissions: emissions from the production and manufacturing of hardware.
  - ✗ Operational emissions: emissions during the use of computer systems.
  - ✗ End-of-life emissions: emissions from the disposal and recycling of hardware.
  - ✓ Development emissions: emissions from design, development, and testing of software.
2. Which statement is NOT true for the power consumption of computing hardware?
  - ✗ CPUs typically have a larger dynamic range than RAM, disks, and network interfaces.
  - ✗ A low server utilization usually correlates with low energy efficiency.
  - ✓ As the peak performance increases from one server generation to the next, it becomes more and more important to utilize server hardware well for energy-proportional computing.
  - ✗ The dynamic power consumption of wired networks is so limited that this is often neglected in carbon footprint assessment.
3. Which statement about Power Usage Effectiveness (PUE) is NOT true?
  - ✗ Most of the facility power taken into account for PUE in large data centers goes to cooling, followed by power distribution and conversion.
  - ✓ PUE is often better for small, specialized data centers, as they need less energy overall.
  - ✗ PUE can be misleading when facility energy is reused beyond data centers.
  - ✗ PUE measurement methodologies vary drastically to the point that results are hard to compare.
4. What is NOT a reason why operational cloud carbon footprint assessments are difficult?
  - ✓ Grid carbon intensity varies between regions, seasons, weekdays, and times of day.
  - ✗ Cloud provider reports are coarse-grained, published late, and methodologies are not detailed.
  - ✗ Location independence means you do not know what physical server CPU is used by your VMs.
  - ✗ You do not have access to any physical or software power meter readings from within a VM.
5. What is NOT a reason why carbon-aware computing is not used more in practice?
  - ✗ Missing runtime information: To time shift large-scale delay-tolerant batch processing applications, you need to know application runtimes before running the applications.
  - ✓ Limited applicability: The majority of cloud applications are not delay-tolerant but latency-critical, user-facing Web applications, and those cannot be managed based on varying grid carbon intensity.
  - ✗ Missing financial incentive: There is no financial benefit to aligning computational loads with low-carbon energy availability so far.
  - ✗ Limited support on public clouds: No public cloud provider has made carbon-aware computing mechanisms available to their customers.
Related Topics
References:
- COMPSCI4106/5118 Cloud Systems course materials
Link to original

Quartz 4

Explorer

Cloud Systems Exam Preparation Printable

Exam Preparation

Cloud Systems Exam Topics

Exam Format

Part 1: Cloud Resource Management

Chapter 1: Introduction to Cloud Computing

Chapter 2: Virtual Machines

Chapter 3: Containers and Container Management

Chapter 4: Cloud Infrastructure Management

Chapter 5: Cloud Sustainability

Part 2: Scalable and Sustainable Architectures

Chapter 6: Cloud System Design

Chapter 7: Modern Cloud Architectures

Chapter 8: Flavours of Cloud

Chapter 9: A Wider Lens on Sustainability

Exam Preparation Tips

Cloud Systems Practice Questions

Virtual Machines and Virtualization

Question 1

Question 2

Question 3

Question 4

Containers and Container Management

Question 5

Question 6

Question 7

Question 8

Cloud Infrastructure Management

Question 9

Question 10

Question 11

Question 12

Question 13

Question 14

Cloud Sustainability

Question 15

Question 16

Cloud System Design

Question 17

Question 18

Question 19

Question 20

Modern Cloud Architectures

Question 21

Question 22

Question 23

Question 24

Question 25

Extra

Question 26

Question 27

Question 28

Question 29

Question 30

Question 31

Question 32

Question 33

Question 34

Question 35

Question 36

Question 37

Question 38

Question 39

Question 40

Question 41

Question 42

Question 43

Question 44

Question 45

Question 46

Question 47

Cloud Resource Management

Question 48

Question 49

Question 50

Question 51

Question 52

Scalable and Sustainable Architectures