Cloud Carbon Footprint

The carbon footprint of cloud computing refers to the greenhouse gas emissions associated with the deployment, operation, and use of cloud services. As cloud computing continues to grow, understanding and mitigating its environmental impact becomes increasingly important for sustainable IT practices.

Understanding ICT and Cloud Emissions

The Growing Footprint of ICT

Information and Communication Technologies (ICT) are estimated to contribute significantly to global carbon emissions:

ICT was estimated to produce between 1.0 and 1.7 gigatons of CO₂e (carbon dioxide equivalent) in 2020
This represents approximately 1.8% to 2.8% of global greenhouse gas emissions
For comparison, commercial aviation accounts for around 2% of global emissions
If overall global emissions decrease while ICT emissions remain constant, ICT’s relative share could increase significantly

Cloud Computing’s Contribution

Within the ICT sector, data centers (including cloud infrastructure) are major contributors to emissions:

Data centers account for approximately one-third of ICT’s carbon footprint
Cloud computing has both positive and negative effects on overall emissions:
- Positive: Consolidation, higher utilization, economies of scale
- Negative: Increased demand, rebound effects, energy-intensive applications

Drivers of Growth

Several technology trends are driving increased emissions from cloud computing:

Artificial Intelligence and Machine Learning: Training large models requires significant computational resources
Big Data and Analytics: Processing and storing vast amounts of data
Internet of Things (IoT): Generating and processing data from billions of connected devices
High-Definition Media: Streaming and storing increasingly high-resolution content
Blockchain and Cryptocurrencies: Energy-intensive consensus mechanisms

Lifecycle Emissions in Cloud Computing

Cloud carbon emissions can be categorized based on their source in the lifecycle:

Embodied Emissions (Scope 3)

Emissions from raw material sourcing, manufacturing, and transportation of hardware:

Represents approximately 20-25% of cloud infrastructure’s total emissions
Includes emissions from producing servers, networking equipment, cooling systems
Also includes emissions from constructing data centers
Example: The manufacturing of a server like the Dell PowerEdge R740 can account for nearly 50% of its lifetime carbon footprint

Operational Emissions (Scope 2)

Emissions from using electricity for powering computing and networking hardware:

Represents approximately 70-75% of cloud infrastructure’s total emissions
Primary source is electricity consumption for:
- Server operation
- Cooling systems
- Network equipment
- Power distribution and conversion losses

End-of-Life Emissions (Scope 3)

Emissions from recycling and disposal of e-waste:

Represents approximately 5% of total emissions
Includes emissions from transportation, processing, and disposal
Can be reduced through equipment refurbishment and proper recycling

Measuring Cloud Carbon Footprint

Challenges in Measurement

Accurately measuring cloud carbon footprint faces several challenges:

Lack of Transparency: Limited visibility into actual hardware and datacenter operations
Methodological Differences: Varying approaches to calculation and reporting
Data Availability: Limited access to real-time energy consumption data
Shared Infrastructure: Difficulty in attribution for multi-tenant resources
Complex Supply Chains: Tracking emissions across global supply chains

Greenhouse Gas Protocol Scopes

The Greenhouse Gas (GHG) Protocol defines three scopes for emissions reporting:

Scope 1: Direct emissions from owned or controlled sources
- For cloud providers: Emissions from backup generators, refrigerants
Scope 2: Indirect emissions from purchased electricity
- For cloud providers: Emissions from electricity powering data centers
- For cloud users: Considered part of their Scope 3 emissions
Scope 3: All other indirect emissions in the value chain
- For cloud providers: Equipment manufacturing, employee travel, etc.
- For cloud users: Emissions from using cloud services

Estimation Methodologies

Cloud Provider Reporting

Major cloud providers (AWS, Google Cloud, Microsoft Azure) provide carbon emissions data:

Usually reported quarterly or annually
Often aggregated at the service level (e.g., EC2, S3, etc.)
May use market-based measures including renewable energy credits (RECs)
Typically not granular enough for detailed optimization

Third-Party Estimation

Tools and methodologies developed to estimate cloud carbon footprint:

Cloud Carbon Footprint (CCF) Methodology:
- Converts resource usage to energy consumption and then to carbon emissions
- Uses energy conversion factors for different resource types
- Accounts for PUE (Power Usage Effectiveness)
- Applies regional grid emissions factors
Formula:
```
Operational emissions = cloud resource usage × energy conversion factor × PUE × grid emissions factor
```

Measurement Granularity Levels

Cloud computing systems can be measured at multiple levels, from individual components to entire data centers. Each level provides different insights and presents unique measurement challenges.

Software-level Measurement

Software-level measurements focus on the energy and resource consumption of specific applications, processes, or code components.

Tools and Approaches

Intel RAPL (Running Average Power Limiting)
- Previously available as Intel Power Gadget and PowerLog
- Measures power consumption of CPU cores, graphics, and memory
- Compatible with modern Intel and AMD CPUs
- Exposed through the perf wrapper in Linux
NVIDIA SMI and NVML
- SMI: Command-line tool for monitoring NVIDIA GPUs
- NVML: C-based library for programmatic monitoring
- Provides power, utilization, temperature, and memory metrics
Linux Power Monitoring Tools
- PowerTOP: Detailed power consumption analysis
- powerstat: Statistics gathering daemon for power measurements
Application-Specific Measurement Libraries
- CodeCarbon: Estimates carbon emissions of compute
- PowerAPI: API for building software-defined power meters
- Scaphandre: Power consumption metrics collector focused on observability

Measurement Methodology

These tools typically use a combination of:

Hardware performance counters
Statistical models based on component utilization
Direct measurements from hardware sensors (where available)
Correlation with known power consumption patterns

Limitations

Accuracy varies based on hardware support
Estimations rather than exact measurements in many cases
Overhead of measurement process itself
Limited visibility into hardware-level details

Server-level Measurement

Server-level measurements provide a more comprehensive view of resource consumption for entire physical or virtual machines.

Component-level Monitoring

CPU power consumption: Per-socket and per-core measurements
Memory usage: Capacity and bandwidth utilization
Storage activity: Read/write operations, throughput
Network traffic: Packets, bandwidth, protocols

Intelligent Platform Management Interface (IPMI)

Standardized hardware interface for “out-of-band” management
Functions independent of the server’s operating system
Uses a dedicated microcontroller called Baseboard Management Controller (BMC)
Capabilities:
- Remote administration regardless of OS or power state
- Monitoring of temperature, voltage, fan speed, power supply status
- Control functions: power cycling, server restart, BIOS configuration
- Logging system events and errors for troubleshooting

Power Measurement Accuracy

Direct measurement via built-in sensors is most accurate
Some servers provide power data at subsystem level
Modern servers can report power consumption per component
Historical data can be logged for trend analysis

Rack-level Measurement

Rack-level measurements focus on the collective consumption of multiple servers and supporting infrastructure within a rack.

Key Measurement Components

Intelligent Power Distribution Units (PDUs)
- Provide per-outlet power metering
- Real-time monitoring of current, voltage, power factor
- Historical logging capabilities
- Sometimes include environmental sensors
Rack Inlet/Outlet Temperature Monitoring
- Temperature sensors at air intake and exhaust points
- Used to calculate cooling efficiency
- Helps identify hotspots and airflow issues
Per-rack Cooling Efficiency
- Ratio of cooling power to computing power
- Identification of over-cooled or under-cooled racks
- Optimization of airflow and temperature setpoints

Benefits of Rack-level Measurement

More granular than data center-wide metrics
Enables identification of inefficient racks
Supports targeted optimization efforts
Provides insights for rack placement and design

Data Center-level Measurement

Data center-level measurements provide a holistic view of facility-wide consumption and efficiency.

Total Facility Power Measurement

IT Equipment Power
- Servers, storage, and networking equipment
- The productive power that delivers computing services
Infrastructure Power
- HVAC Systems: Cooling, humidity control, air handling
- Power Distribution: PDUs, UPSs, batteries, transformers
- Auxiliary Systems: Lighting, security, fire suppression

Environmental Monitoring

Temperature and humidity throughout the facility
Airflow patterns and pressure differentials
Particulate levels and air quality
Leak detection systems

DC Manageability Interface (DCMI)

Standard built upon IPMI to address data center-wide manageability
Extended capabilities for large-scale deployments
Power management features:
- Monitoring across multiple systems
- Power capping to limit consumption during peak demand
- Aggregated reporting for facility management

Network-level Measurement

Network infrastructure power consumption is often overlooked but forms a significant portion of IT energy use.

Challenges in Network Measurement

Diverse equipment spanning multiple domains and locations
Different device models with varying efficiency characteristics
Dynamic routing and traffic patterns
Estimated to consume ~1% of global electricity

Measurement Approaches

Device-level Monitoring: Power consumption per switch, router, firewall
Traffic-based Estimation: Models relating network traffic to energy use
Infrastructure Utilization: Correlation between link utilization and power
End-to-end Analysis: Energy consumed to transfer data between endpoints

Factors Affecting Network Power Consumption

Hardware specifications and age
Utilization levels
Traffic patterns
Protocol efficiency
Network topology
Ambient conditions

Practical Implementation Considerations

Measurement Frequency

Real-time: Continuous monitoring for immediate action
Interval-based: Regular sampling (seconds, minutes, hours)
On-demand: Triggered measurements for specific analysis

Data Storage and Analysis

Time-series databases for efficient storage of measurement data
Analytics platforms for trend analysis and anomaly detection
Visualization tools for dashboard creation and reporting
Machine learning for pattern recognition and prediction

Integration with Management Systems

DCIM (Data Center Infrastructure Management) integration
Correlation with application performance metrics
Automated actions based on measurement thresholds
Capacity planning and forecasting

Cost-Benefit Considerations

Instrumentation costs vs. potential savings
Additional power overhead of measurement systems
Staffing requirements for monitoring and analysis
ROI calculation for measurement initiatives

Case Studies in Measurement Granularity

Google’s Data Center Measurement Approach

Comprehensive instrumentation from component to facility level
Custom power monitoring devices for servers
Machine learning for predictive analytics
Integration with cooling control systems
Public reporting of fleet-wide PUE metrics

Financial Services Sector Example

High-frequency measurements for trading platforms
Correlation of energy use with transaction volume
Workload-aware power management
Regulatory compliance reporting
Emissions allocation to business units

Challenges and Future Directions

Current Limitations

Gaps in measurement capability across the stack
Inconsistent methodologies between organizations
Limited standardization of metrics and reporting
Balancing measurement detail with system overhead

Emerging Capabilities

Non-intrusive load monitoring techniques
Improved sensor technology with lower overhead
AI-driven analysis and optimization
Standardized reporting frameworks
Carbon-aware application development

Quartz 4

Explorer