Exam Preparation

  • Cloud Systems Exam Topics

    This note provides an overview of the key topics covered in the Cloud Systems course (COMPSCI4106/5118) that are relevant for the exam. The exam has a 50% weight of the overall grade and covers both parts of the course evenly.

    Exam Format

    • Weight: 50% of the overall grade
    • Structure: Four questions, 10 marks each
      • Two questions on Part 1 (Cloud Resource Management)
      • Two questions on Part 2 (Scalable and Sustainable Architectures)
    • Question Types: Mix of long-form answers and MCQ-style questions
    • Duration: 60 minutes for 40 marks

    Part 1: Cloud Resource Management

    Chapter 1: Introduction to Cloud Computing

    • NIST Definition and its Dimensions
      • Five essential characteristics
      • Three service models
      • Four deployment models
    • Virtualization Terminology and Categories
      • Process virtualization
      • OS-level virtualization
      • System virtualization
      • Differences between categories

    Chapter 2: Virtual Machines

    • Categories of Instructions and Virtualizability
      • Privileged vs. sensitive instructions
      • Popek and Goldberg’s theorem
      • Critical instructions in x86
    • Full Virtualization (Binary Translation)
      • How binary translation works
      • Shadow page tables
      • Memory management in virtualization
    • OS-Assisted Virtualization
      • Xen architecture (domains)
      • CPU virtualization in Xen
      • Memory management in Xen
    • Hardware-Assisted Virtualization
      • CPU virtualization extensions
      • Memory virtualization extensions
      • Tagged Translation Lookaside Buffer

    Chapter 3: Containers and Container Management

    • Linux Kernel Containment Features
      • chroot system call
      • namespaces (PID, network, mount, etc.)
      • cgroups (Control Groups)
      • capabilities
    • Docker
      • Images and Dockerfiles
      • Container instances
      • Docker architecture and components
    • Containers vs. VMs
      • Performance differences (CPU, memory, network, I/O)
      • Image size and boot time
      • Isolation and security considerations
    • Container Orchestration
      • Kubernetes architecture
      • Pods, deployments, services
      • Horizontal Pod Autoscaler

    Chapter 4: Cloud Infrastructure Management

    • Challenges in Cloud Infrastructure
      • Server sprawl
      • Configuration drift
      • Snowflake servers
    • Cloud Operating Systems
      • OpenStack components
      • Virtual networking
      • Software-Defined Networking (SDN)
    • VM Management
      • VM snapshots
      • VM migration (cold, warm, live)
      • Live migration process in Xen
    • Infrastructure-as-Code and CI/CD
      • Infrastructure definition files
      • Ansible architecture and concepts
      • Continuous delivery vs. continuous deployment
      • Deployment strategies (blue/green, canary)

    Chapter 5: Cloud Sustainability

    • Emissions Lifecycle
      • Embodied emissions
      • Operational emissions
      • End-of-life emissions
    • Energy Efficiency and Proportionality
      • Static vs. dynamic power consumption
      • Energy-proportional computing
      • Koomey’s Law and trends
    • Power Usage Effectiveness (PUE)
      • Definition and calculation
      • Industry trends and benchmarks
      • Limitations of PUE
    • Carbon Footprint Measurement
      • Greenhouse Gas Protocol scopes
      • Estimation methodologies
      • Cloud Carbon Footprint (CCF)
    • Carbon-Aware Computing
      • Time-shifting workloads
      • Location-shifting workloads
      • Carbon intensity signals

    Part 2: Scalable and Sustainable Architectures

    Chapter 6: Cloud System Design

    • Distributed Systems Concepts
      • Fallacies of distributed computing
      • Key aspects of distributed systems
      • Failures, errors, faults, and QoS
    • Quality Attributes
      • Dependability
      • Availability
      • Reliability
    • High Availability
      • Fault tolerance
      • Error detection
      • Failover strategies (active-active, active-passive)

    Chapter 7: Modern Cloud Architectures

    • Architectural Approaches
      • Layering and tiering
      • Redundancy by replication
    • Cloud Scaling
      • Stateless scaling (load balancing)
      • Stateful scaling (partitioning)
      • Horizontal vs. vertical scaling
    • Advanced Architectures
      • Microservices
      • Service Mesh Technologies (SMTs)
      • Cloud-native technologies
      • API Gateways

    Chapter 8: Flavours of Cloud

    • Provisioning Levels
      • Infrastructure as a Service (IaaS)
      • Platform as a Service (PaaS)
      • Software as a Service (SaaS)
      • Function as a Service (FaaS)
    • Deployment Models
      • Public, private, community, hybrid clouds
      • Considerations for choosing models
    • Cross-Cloud Computing
      • Hybrid clouds
      • Federated clouds
      • Multi-clouds
      • Meta-clouds
    • Computing Continuum
      • Edge computing vs. fog computing
      • Support roles in the continuum
      • Comparison of fog/edge with cloud

    Chapter 9: A Wider Lens on Sustainability

    • Designing Dependable Data Centres
      • Hardware redundancy
      • Network redundancy
      • Power redundancy
      • Cooling redundancy
    • Carbon Footprint Measurement Frameworks
      • GHG Protocol
      • Real-time vs. historical approaches
      • Life Cycle Assessment (LCA)
    • Measurement Granularities
      • Software-level
      • Server-level
      • Rack-level
      • Data center-level
      • Network-level
    • Carbon Intensity
      • Definition and regional variations
      • Average vs. marginal intensity
      • Carbon intensity signals
    • Carbon-Aware Decision Making
      • Vendor decisions
      • User decisions
      • Instance type selection

    Exam Preparation Tips

    1. Focus on Understanding Concepts:

      • Rather than memorizing details, ensure you understand core concepts and can explain them
      • Be prepared to apply concepts to different scenarios
    2. Practice Explanation and Justification:

      • Many questions will ask you to explain your reasoning
      • Practice articulating clear, concise explanations
    3. Review Example Questions:

      • Look at the example questions provided in lecture materials
      • Practice answering similar questions within the word limits
    4. Connect Related Concepts:

      • Understand how different topics relate to each other
      • Be prepared to discuss trade-offs between different approaches
    5. Terminology:

      • Ensure you’re familiar with the correct terminology
      • Be able to define key terms and concepts
    6. MCQ Strategy:

      • For MCQ questions, you’ll need to both select the right answer and explain your choice
      • Practice justifying answers in one concise sentence
    Link to original
  • Cloud Systems Practice Questions

    Virtual Machines and Virtualization

    Question 1

    Question

    Briefly describe what critical instructions are and why they presented a challenge for x86 system virtualization [2]

    Question 2

    Question

    Briefly summarize why the physical main memory can simply be partitioned for Xen guests [3]

    Question 3

    Question

    Explain the key difference between shadow page tables used in full virtualization and the memory management approach used in Xen. [3]

    Question 4

    Question

    Which statement about hardware-assisted virtualization is correct?
    a) Modifying the guest OS
    b) Binary translation for critical instructions
    c) Introduces CPU modes specifically for virtualization
    d) Incompatible with legacy OS

    Containers and Container Management

    Question 5

    Question

    Briefly explain what the chroot system call on Linux does and how it is useful for containerization [2]

    Question 6

    Question

    Compare and contrast namespaces and cgroups in Linux containment. [4]

    Question 7

    Question

    Why are container images typically smaller than VM images? Give two reasons. [2]

    Question 8

    Question

    Explain the relationship between Dockerfile, image, and container. [3]

    Cloud Infrastructure Management

    Question 9

    Question

    Briefly explain how Infrastructure-as-Code addresses snowflake servers [2]

    Question 10

    Question

    Explain the difference between continuous delivery and continuous deployment. [2]

    Question 11

    Question

    What is the primary purpose of live VM migration, and what components must migrate? [3]

    Question 12

    Question

    List and briefly explain stages of Xen live migration. [4]

    Question 13

    Question

    Name/explain an issue with PUE metric. [1]

    Question 14

    Question

    Explain energy-proportional computing and its importance for cloud data centers. [3]

    Cloud Sustainability

    Question 15

    Question

    What is the difference between embodied and operational carbon emissions in cloud computing? [2]

    Question 16

    Question

    What is carbon-aware computing and how does it differ from energy efficiency? [3]

    Cloud System Design

    Question 17

    Question

    You are designing a cloud-based ML system with training and inference components. Why deploy the inference service at the edge rather than the cloud? [2]

    Question 18

    Question

    Match scenarios to failover strategies (Active-Active/Active-Passive), justify.
    a) Financial trading platform
    b) Content management system

    a) Active-Active, as it cannot tolerate downtime [2].
    b) Active-Passive, acceptable downtime, cost-effective [2].

    Question 19

    Question

    Explain the difference between availability and reliability in cloud systems. [2]

    Question 20

    Question

    What does “five nines” availability mean, and how much downtime does it represent annually? [2]

    Modern Cloud Architectures

    Question 21

    Question

    A financial tech company processes high daily transaction volumes. Choose and explain the best architecture:
    a) Single high-memory server
    b) Load-balanced servers, DB partitioning
    c) Serverless functions, single DB
    d) Monolithic app, local caching

    b) Load-balanced servers with DB partitioning, offering scalability, performance, and reliability [2].

    Question 22

    Question

    Differentiate horizontal and vertical scaling, give examples. [4]

    Question 23

    Question

    What are microservices and two advantages over monoliths? [3]

    Question 24

    Question

    What is a service mesh, and what microservices problem does it solve? [2]

    Question 25

    Question

    Select appropriate service model (IaaS, PaaS, SaaS, FaaS) for scenarios:
    a) Startup without infrastructure management
    b) Company collaboration tools
    c) Research simulations computing power
    d) Web app developer avoiding server runtime

    a) FaaS [1]
    b) SaaS [1]
    c) IaaS [1]
    d) PaaS [1]

    Extra

    Question 26

    What are the key disadvantages of the microservices architecture?

    Question 27

    Explain the concept of "trap and emulate" in virtualization and when it can be used.

    Question 28

    What are the five essential characteristics of cloud computing according to the NIST definition?

    Question 29

    Compare and contrast the three most common approaches to virtualization on x86 architectures.

    Question 30

    Explain the concept of energy proportionality in data centers and why it's important for cloud sustainability.

    Question 31

    What is autoscaling in Kubernetes and how does the Horizontal Pod Autoscaler calculate the desired number of replicas?

    Question 32

    Explain the carbon intensity concept and how it's used in carbon-aware computing.

    Question 33

    What are shadow page tables in virtualization and why are they important?

    Question 34

    Describe the blue/green deployment strategy and its advantages.

    Question 35

    What is Jevons' Paradox and how does it apply to energy efficiency in cloud computing?

    Question 36

    What is Infrastructure as Code (IaC) and how does it address the challenges of "configuration drift" and "snowflake servers"?

    Question 37

    Explain the concept of Continuous Integration/Continuous Delivery (CI/CD) in cloud environments and list its key practices.

    Question 38

    What are the differences between Average Carbon Intensity and Marginal Carbon Intensity when measuring the environmental impact of computing?

    Question 39

    What are the three main approaches to achieving fault tolerance in distributed systems, and how do they differ?

    Question 40

    Compare and contrast the scaling approaches for stateless vs. stateful components in cloud architectures.

    Question 41

    What are the key components of Kubernetes and how do they work together to manage containerized applications?

    Question 42

    Describe the lifecycle emissions of datacenter hardware and explain why operational emissions might not be the only environmental concern.

    Question 43

    Explain the CAP theorem and its implications for distributed database design in cloud environments.

    Question 44

    What are the primary differences between private, public, community, and hybrid cloud deployment models?

    Question 45

    Describe the Power Usage Effectiveness (PUE) metric, its limitations, and alternative metrics for measuring datacenter efficiency.

    Question 46

    Explain the concept of cross-cloud computing and the different approaches to implementing it.

    Question 47

    How do serverless/Function-as-a-Service (FaaS) platforms work, and what are their advantages and limitations?

    Cloud Resource Management

    Question 48

    Question

    Question 49

    Question

    Question 50

    Question

    Question 51

    Question

    Question 52

    Question

    Scalable and Sustainable Architectures

    Question 53

    You are designing a cloud architecture for a financial application that processes transactions. The application needs to:

    • Handle high volume of transactions
    • Maintain strict data consistency
    • Scale dynamically based on load
    • Maintain high availability

    Choose an appropriate architecture pattern and justify your choice. Discuss any potential limitations and how you might address them. [6]

    Question 54

    Question

    Question 55

    Question

    Question 56

    Question

    Question 57

    Question

    Cloud Sustainability

    Question 58

    Question

    Question 59

    Question

    Question 60

    Question

    Question 61

    A global company operates cloud workloads across multiple regions. Outline a carbon-aware scheduling strategy that would optimize for:

    1. Lowest carbon emissions
    2. Lowest latency for users
    3. Regulatory compliance for data sovereignty

    Explain the trade-offs involved and how you would prioritize these requirements. [6]

    Question 62

    Question

    Link to original
  • Cloud System Quizzes

    This note contains a collection of weekly quizzes from the Cloud Systems course, organized by topic. These self-assessment questions are useful for checking your understanding and preparing for the exam.

    Week 1: Cloud Computing Introduction

    1. Which statement correctly differentiates between clusters, clouds, and grids?

      • ✓ Clusters locally connect computers to form a single system, grids integrate widely distributed systems for common tasks, and clouds offer scalable computing resources as a service.
      • ✗ Clusters use loosely connected computers that are used together to solve large problems, while grids consist of computers connected in a high-performance local network, and clouds provide on-demand resources over the Internet.
      • ✗ Clouds provide centralized resources on top of grid computing infrastructure, with better scalability than clusters.
    2. Which correctly matches the cloud service model with its primary function?

      • ✗ IaaS: Provides a framework for application components and development tools for coding.
      • ✗ PaaS: Offers virtual machines, storage, and networking resources for full user control.
      • ✓ SaaS: Delivers complete software solutions accessible via the Internet (e.g., as Web applications).
    3. Which is NOT one of the five essential characteristics of cloud computing as defined by NIST?

      • ✗ On-Demand Self-Service
      • ✗ Broad Network Access
      • ✓ High Availability (guaranteed uptime and limited service interruptions)
      • ✗ Resource Pooling
      • ✗ Rapid Elasticity
    4. Which best describes virtualization?

      • ✗ Virtualization is the creation of simulated environments that fully replace physical hardware.
      • ✓ Virtualization allows multiple virtual instances to run on a single physical hardware resource, abstracting and sharing resources.
      • ✗ Virtualization refers to using cloud-based services to dynamically allocate physical servers to customers.
    5. Which statement correctly describes the use cases of VMs versus containers?

      • ✓ VMs allow running applications requiring a different operating system, while containers are more suitable for shipping software components with their dependencies.
      • ✗ VMs provide faster startup times and are ideal for high-performance applications, while containers are ideal only for stateless applications.
      • ✗ It is typically possible to both run more VM instances and to store more VM images than container instances and images on the same physical host machine.

    Week 2: System Virtualization

    1. Which statement correctly differentiates between Type 1 and Type 2 hypervisors?

      • ✓ Type 1 hypervisors run on bare-metal hardware, minimizing overhead (ideal for clouds), whereas Type 2 hypervisors run on top of an operating system, enabling virtualization on personal computing environments.
      • ✗ Both Type 1 and Type 2 hypervisors run directly on hardware, but Type 2 hypervisors have more complex architectures, making them less efficient.
      • ✗ Type 2 hypervisors run on top of a host operating system, ideal for running containerized applications, whereas Type 1 hypervisors run directly on bare-metal hardware, ideal for running virtual machines.
    2. Which computing resources are typically virtualized to allow multiple VMs to operate on a single physical host?

      • ✗ Only CPUs and memory are virtualized, as network and storage devices cannot be shared.
      • ✓ Access to CPUs, memory, I/O devices, and storage devices all need to be managed to create isolated environments for VMs on a shared host system.
      • ✗ Typically, only specific CPU instructions need to be virtualized, as memory, I/O devices, and storage devices are accessed through instructions.
    3. Which statement accurately describes Popek and Goldberg’s theorem for efficient virtualization and its implications for x86 processors?

      • ✗ The theorem states that efficient virtualization can only be achieved if there are no sensitive instructions.
      • ✓ The theorem requires that all privileged and sensitive instructions trap to the hypervisor when executed in user mode. x86 processors did not meet this requirement, requiring binary re-writing.
      • ✗ The theorem specifies that all privileged instructions can only execute in kernel mode, and x86 processors implement this by using two of four rings.
    4. In the context of full virtualization, what is true for shadow page tables?

      • ✓ Shadow page tables, which map virtual memory addresses as used by the guest OS directly to the actual physical memory of the host, are maintained by the hypervisor and used by hardware.
      • ✗ Shadow page tables are maintained by the guest operating system, so it can manage and use physical memory without hypervisor involvement.
      • ✗ Shadow page tables copy the memory address translation entries of guest operating systems to make VMs fault-tolerant and easy-to-migrate.
    5. In Xen, how is I/O virtualized to allow guest VMs isolated access to physical devices?

      • ✗ Each guest VM includes its own device drivers, which communicate directly with the physical hardware.
      • ✗ In Xen, all device drivers are installed in the hypervisor, which provides virtual devices.
      • ✓ Xen uses a split-driver model where lightweight virtual device drivers in guest VMs communicate with Xen through hypercalls and events, and Xen, in turn, uses the actual physical device drivers in the privileged domain (Dom0).

    Week 3: Containers

    1. Which Linux kernel feature is NOT used for OS-level virtualization?

      • ✗ chroot, which restricts a process’s file system access to a specific directory.
      • ✗ namespaces, which limits access to system resources such as network devices, mount points, and processes.
      • ✓ cron, which schedules recurring tasks at specific times or intervals.
      • ✗ cgroups, which limit access to compute resources such as CPU and memory.
    2. Which is NOT a feature that Docker provides on top of a container library?

      • ✗ Image distribution – Docker provides tools to pull and push container images from and to public/private registries using a hierarchical image format.
      • ✗ Build tools – Docker includes tooling to create images from textual descriptions that modify a base image.
      • ✗ Orchestration – Docker facilitates basic orchestration (e.g., via docker-compose) of multi-container applications.
      • ✓ Fault tolerance and auto-scaling – Docker automatically restarts and replicates containers as required.
    3. When comparing the performance overhead of VMs and containers, which statement is true?

      • ✗ Since containers share the same operating system kernel, while VMs include their own guest operating system, containers have a higher CPU overhead.
      • ✓ Using hardware devices through virtual devices and virtual device drivers can lead to higher latencies and latency variations in VMs.
      • ✗ Containers typically experience lower throughput for sequential memory operations than VMs due to the lack of direct access to hardware resources.
      • ✗ A set of similar VMs on one host has a smaller disk footprint than several similar container images because VM images are hierarchical, and layers can be shared among similar VMs.
    4. Which is NOT an advantage of microservice architectures?

      • ✗ Microservices enable independent deployment and scaling of services.
      • ✗ Microservices can improve resilience by isolating failures within individual services.
      • ✗ Microservices allow flexibility in choosing different technologies for different components.
      • ✓ Microservices increase cluster resource utilization by running more services simultaneously.
    5. Which statement is true for features of Kubernetes?

      • ✗ Using a load balancer in Kubernetes spreads the containers of a pod over different nodes.
      • ✓ Kubernetes probes nodes and pods in a configurable interval, noticing failures with a delay as large as the interval.
      • ✗ Kubernetes’ horizontal pod autoscaler will optimize the CPU limit of containers in a pod towards a user-provided CPU utilization target.
      • ✗ The Kubernetes scheduler prioritizes CPU utilization over memory and disk utilization.

    Week 4: Cloud Infrastructure Management

    1. Why do cloud operating systems like OpenStack NOT typically use different physical hosts for different host roles?

      • ✗ To ensure fault tolerance by isolating critical services so that failure in one component does not affect other components.
      • ✓ To reduce costs by spreading out storage, compute, and networking across clusters.
      • ✗ To prevent interference between guest VMs on compute and the systems controllers on the controller hosts.
      • ✗ To enable the use of specialized hardware for different functions.
    2. Which statement on virtual networking is NOT correct?

      • ✗ Virtual switches allow VMs on the same host to communicate, functioning similarly to physical network switches.
      • ✓ Virtual Network Functions (VNFs) refer to software-based network appliances like firewalls, load balancers, and routers running on physical infrastructure to perform traditional networking tasks.
      • ✗ Virtual networks are logically isolated network environments created in virtualized environments.
      • ✗ Software-Defined Networks (SDNs) revolve around the programmability of network configurations.
    3. Which is NOT a correct step or feature of live VM migration in Xen?

      • ✗ Xen uses an iterative pre-copy strategy to migrate memory pages, with the last dirty pages being transferred after the VM is paused.
      • ✗ Xen sends unsolicited ARP requests to invalidate IP-to-MAC mappings, allowing the destination VM to respond to new ARP requests.
      • ✗ Xen utilizes network migration and remote virtual storage to ensure continuous access to volumes after.
      • ✓ Xen synchronizes the source and destination VMs by executing the same CPU instructions in real time during live migration.
    4. Which issue is NOT effectively addressed by adopting the Infrastructure-as-Code paradigm?

      • ✗ Configuration drift: IaC can help avoid undocumented inconsistencies in configuration.
      • ✗ Server sprawl: IaC can effectively address the uncontrolled creation and proliferation of redundant servers.
      • ✓ Resource underprovisioning: IaC can mitigate resource bottlenecks by adequately scaling the infrastructure for the code.
      • ✗ Snowflake servers: IaC can help eliminate unique, difficult-to-replicate servers.
    5. Which type of testing is NOT commonly performed during canary deployments?

      • ✗ Traffic Analysis: Monitoring user behavior, latency, and error rates of the new deployment under real-world traffic.
      • ✗ Performance Monitoring: Measuring system responsiveness, resource utilization, and throughput during the partial release.
      • ✗ A/B Testing: Comparing the canary version’s performance and user engagement metrics side-by-side against the previous version’s metrics.
      • ✓ Chaos Testing: Randomly introducing controlled failures to the system to evaluate the new deployment’s resiliency.

    Week 5: Cloud Sustainability

    1. Which is NOT a correct category of computer system lifecycle emissions?

      • ✗ Embodied emissions: emissions from the production and manufacturing of hardware.
      • ✗ Operational emissions: emissions during the use of computer systems.
      • ✗ End-of-life emissions: emissions from the disposal and recycling of hardware.
      • ✓ Development emissions: emissions from design, development, and testing of software.
    2. Which statement is NOT true for the power consumption of computing hardware?

      • ✗ CPUs typically have a larger dynamic range than RAM, disks, and network interfaces.
      • ✗ A low server utilization usually correlates with low energy efficiency.
      • ✓ As the peak performance increases from one server generation to the next, it becomes more and more important to utilize server hardware well for energy-proportional computing.
      • ✗ The dynamic power consumption of wired networks is so limited that this is often neglected in carbon footprint assessment.
    3. Which statement about Power Usage Effectiveness (PUE) is NOT true?

      • ✗ Most of the facility power taken into account for PUE in large data centers goes to cooling, followed by power distribution and conversion.
      • ✓ PUE is often better for small, specialized data centers, as they need less energy overall.
      • ✗ PUE can be misleading when facility energy is reused beyond data centers.
      • ✗ PUE measurement methodologies vary drastically to the point that results are hard to compare.
    4. What is NOT a reason why operational cloud carbon footprint assessments are difficult?

      • ✓ Grid carbon intensity varies between regions, seasons, weekdays, and times of day.
      • ✗ Cloud provider reports are coarse-grained, published late, and methodologies are not detailed.
      • ✗ Location independence means you do not know what physical server CPU is used by your VMs.
      • ✗ You do not have access to any physical or software power meter readings from within a VM.
    5. What is NOT a reason why carbon-aware computing is not used more in practice?

      • ✗ Missing runtime information: To time shift large-scale delay-tolerant batch processing applications, you need to know application runtimes before running the applications.
      • ✓ Limited applicability: The majority of cloud applications are not delay-tolerant but latency-critical, user-facing Web applications, and those cannot be managed based on varying grid carbon intensity.
      • ✗ Missing financial incentive: There is no financial benefit to aligning computational loads with low-carbon energy availability so far.
      • ✗ Limited support on public clouds: No public cloud provider has made carbon-aware computing mechanisms available to their customers.

    References:

    • COMPSCI4106/5118 Cloud Systems course materials
    Link to original