tools
Module¶
This subpackage provides tools and methods for tasks like calculating the carbon intensity of energy production and calculating the emissions produced due to a computation.
Each tool is implemented in a separate module and must be imported individually (See below).
Carbon Intensity of Energy¶
Carbon intensity refers to the amount of greenhouse gases emitted per unit of electricity generated. It is typically measured in grams of CO₂ equivalents per kilowatt-hour (gCO2e/kWh).
Different types of energy production, such as fossil fuels, renewable, and nuclear power, have varying carbon intensity values. Carbon intensity of an energy mix is the weighted sum of the base carbon intensity values of each energy source based on proportion of each source. The carbon intensity of the energy powering a system significantly impacts the overall carbon emissions of computational tasks.
The table below shows the base carbon intensity values of various electricity production sources. These values are adapted from [5]
Type
Average of
mean
coal
Coal—PC
820
natural gas
Gas—Combined Cycle
490
biogas
Biomass—co-firing,Biomass—dedicated
485
geothermal
Geothermal
38
hydropower
Hydropower
24
nuclear
Nuclear
12
solar
Concentrated Solar Power, Solar PV—rooftop,Solar PV—utility
38.6
wind
Wind onshore, Wind offshore
11.5
One challenge with the carbon intensity calculation is that the values can vary depending on the methodology used to make the calculation. Thus, we provide CI values calculated using multiple approaches (essentially different base values). These values are included in the DataFrame as different columns. You can also use your own base values. By default, the IPCC values are used.
When energy generation data is not available for a country, the average values of Carbon Intensity is used. The source of this data is Carbon Footprint Ltd [8]
- codegreen_core.tools.carbon_intensity.compute_ci(country: str, start_time: datetime, end_time: datetime) pandas.DataFrame ¶
Computes carbon intensity data for a given country and time period.
If energy data is available, the carbon intensity is calculated from actual energy data for the specified time range. If energy data is not available for the country, a default carbon intensity value is used instead. The default CI values for all countries are stored in utilities/ci_default_values.csv.
- codegreen_core.tools.carbon_intensity.compute_ci_from_energy(energy_data: pandas.DataFrame, default_method='ci_ipcc_lifecycle_mean', base_values: dict = None) pandas.DataFrame ¶
Given the energy time series, computes the carbon intensity for each row. You can choose the base value from several sources available or use your own base values.
- Parameters:
energy_data –
- A pandas DataFrame that must include the following columns, representing
the percentage of energy generated from each source:
Coal_per (float): Percentage of energy generated from coal.
Petroleum_per (float): Percentage of energy generated from petroleum.
Biomass_per (float): Percentage of energy generated from biomass.
Natural Gas_per (float): Percentage of energy generated from natural gas.
Geothermal_per (float): Percentage of energy generated from geothermal sources.
Hydroelectricity_per (float): Percentage of energy generated from hydroelectric sources.
Nuclear_per (float): Percentage of energy generated from nuclear sources.
Solar_per (float): Percentage of energy generated from solar sources.
Wind_per (float): Percentage of energy generated from wind sources.
default_method –
- This parameter allows you to choose the base values for each energy source.
By default, the IPCC lifecycle mean values are used. Available options include:
codecarbon (Ref [6])
ipcc_lifecycle_min (Ref [5])
ipcc_lifecycle_mean (default)
ipcc_lifecycle_max
eu_comm (Ref [4])
base_values(optional) –
- A dictionary of custom base carbon intensity values for energy sources.
Must include the following keys:
Coal (float): Base carbon intensity value for coal.
Petroleum (float): Base carbon intensity value for petroleum.
Biomass (float): Base carbon intensity value for biomass.
Natural Gas (float): Base carbon intensity value for natural gas.
Geothermal (float): Base carbon intensity value for geothermal energy.
Hydroelectricity (float): Base carbon intensity value for hydroelectricity.
Nuclear (float): Base carbon intensity value for nuclear energy.
Solar (float): Base carbon intensity value for solar energy.
Wind (float): Base carbon intensity value for wind energy.
Carbon emission of a job¶
The Methodology for calculating carbon emissions (Based on [7])
Carbon emission of a job depends on 2 factors : Energy consumed by the hardware to run the computation and the emissions generated to produce this energy. The unit used is CO2e or Carbon dioxide equivalent.
Carbon Emissions : \(\text{CE} = E \times \text{CI}\) (in \(CO_{2}e\) )
Energy consumption : \(E = t \times \left( n_{c} \times P_{c} \times u_{c} + n_{m} \times P_{m} \right) \times PUE \times 0.001\) (in kWh)
\(t\) : running time in hours
\(n_c\) : the number of core
\(n_m\) : the size of memory available (in Gigabytes)
\(u_c\) : the core usage factor (between 0 and 1)
\(P_c\) : power draw of a computing core (Watt)
\(P_m\) : power draw of memory (Watt)
\(PUE\) : efficiency coefficient of the data center
Emissions related to the production of the energy : represented by the Carbon Intensity of the energy mix during that period. Already implemented above
The result is Carbon emission in CO2e
- codegreen_core.tools.carbon_emission.compare_carbon_emissions(server1, server2, start_time1, start_time2, runtime_minutes)¶
Compares the carbon emissions of running a job with the same duration on two different servers.
- Parameters:
server1 –
A dictionary containing the details of the first server’s hardware and location specifications. Required keys include:
country (str): The country code for the server’s location (used for energy data).
number_core (int): The number of CPU cores.
memory_gb (float): The memory available in Gigabytes.
power_draw_core (float): Power draw of each computing core in Watts.
usage_factor_core (float): The core usage factor, a value between 0 and 1.
power_draw_mem (float): Power draw of memory in Watts.
power_usage_efficiency (float): Efficiency coefficient of the data center.
server2 –
A dictionary containing the details of the second server’s hardware and location specifications. Required keys are identical to those in server1:
country (str): The country code for the server’s location.
number_core (int): The number of CPU cores.
memory_gb (float): The memory available in Gigabytes.
power_draw_core (float): Power draw of each computing core in Watts.
usage_factor_core (float): The core usage factor, a value between 0 and 1.
power_draw_mem (float): Power draw of memory in Watts.
power_usage_efficiency (float): Efficiency coefficient of the data center.
start_time1 – The start time of the job on server1 (datetime).
start_time2 – The start time of the job on server2 (datetime).
runtime_minutes – The total running time of the job in minutes (int).
- Returns:
A dictionary with the carbon emissions for each server and the percentage difference, structured as follows: - emissions_server1 (float): Total carbon emissions for server1 in kilograms of CO2 equivalent. - emissions_server2 (float): Total carbon emissions for server2 in kilograms of CO2 equivalent. - absolute_difference (float): The absolute difference in emissions between the two servers. - higher_emission_server (str): Indicates which server has higher emissions (“server1” or “server2”).
- codegreen_core.tools.carbon_emission.compute_ce(server: dict, start_time: datetime, runtime_minutes: int) tuple[float, pandas.DataFrame] ¶
Calculates the carbon footprint of a job, given its hardware configuration, time, and location. This method returns an hourly time series of the carbon emissions.
The methodology is defined in the documentation.
- Parameters:
server –
A dictionary containing the details about the server, including its hardware specifications. The dictionary should include the following keys:
country (str): The country code where the job was performed (required to fetch energy data).
number_core (int): The number of CPU cores.
memory_gb (float): The size of memory available in Gigabytes.
power_draw_core (float): Power draw of a computing core in Watts.
usage_factor_core (float): The core usage factor, a value between 0 and 1.
power_draw_mem (float): Power draw of memory in Watts.
power_usage_efficiency (float): Efficiency coefficient of the data center.
start_time – The start time of the job (datetime).
runtime_minutes – Total running time of the job in minutes (int).
- Returns:
A tuple containing: - (float): The total carbon footprint of the job in kilograms of CO2 equivalent. - (pandas.DataFrame): A DataFrame containing the hourly time series of carbon emissions.
- codegreen_core.tools.carbon_emission.compute_ce_from_energy(server, ci_data: pandas.DataFrame)¶
Calculates the carbon footprint for energy consumption over a time series. This method returns an hourly time series of the carbon emissions.
The methodology is defined in the documentation. Note that the start and end times for the computation are derived from the first and last rows of the ci_data DataFrame.
- Parameters:
server –
A dictionary containing details about the server, including its hardware specifications. The dictionary should include:
number_core (int): The number of CPU cores.
memory_gb (float): The size of memory available in Gigabytes.
power_draw_core (float): Power draw of a computing core in Watts.
usage_factor_core (float): The core usage factor, a value between 0 and 1.
power_draw_mem (float): Power draw of memory in Watts.
power_usage_efficiency (float): Efficiency coefficient of the data center.
ci_data –
A pandas DataFrame of energy consumption over time. The DataFrame should include the following columns:
startTimeUTC (datetime): The start time of each energy measurement in UTC.
ci_default (float): Carbon intensity values for the energy consumption.
- Returns:
A tuple containing: - (float): The total carbon footprint of the job in kilograms of CO2 equivalent. - (pandas.DataFrame): A DataFrame containing the hourly time series of carbon emissions.
Optimal time shifting¶
- codegreen_core.tools.loadshift_time.predict_now(country: str, estimated_runtime_hours: int, estimated_runtime_minutes: int, hard_finish_date: datetime, criteria: str = 'percent_renewable') tuple ¶
Predicts optimal computation time in the given location starting now
- Parameters:
country (str) – The country code
estimated_runtime_hours (int) – The estimated runtime in hours
estimated_runtime_minutes (int) – The estimated runtime in minutes
hard_finish_date (datetime) – The latest possible finish time for the task. Datetime object in local time zone
criteria (str) – Criteria based on which optimal time is calculated. Valid value “percent_renewable” or “optimal_percent_renewable”
- Returns:
Tuple[timestamp, message, average_percent_renewable]
- Return type:
tuple
- codegreen_core.tools.loadshift_time.predict_optimal_time(energy_data: pandas.DataFrame, estimated_runtime_hours: int, estimated_runtime_minutes: int, hard_finish_date: datetime, request_time: datetime = None) tuple ¶
Predicts the optimal time window to run a task based in energy data, run time estimates and renewable energy target.
- Parameters:
energy_data – A DataFrame containing the energy data including startTimeUTC, totalRenewable,total,percent_renewable,posix_timestamp
estimated_runtime_hours – The estimated runtime in hours
estimated_runtime_minutes – The estimated runtime in minutes
hard_finish_date – The latest possible finish time for the task.
request_time – The time at which the prediction is requested. Defaults to None, then the current time is used. Assumed to be in local timezone
- Returns:
Tuple[timestamp, message, average_percent_renewable]
- Return type:
tuple
Optimal Location shifting¶
- codegreen_core.tools.loadshift_location.predict_optimal_location(forecast_data, estimated_runtime_hours, estimated_runtime_minutes, percent_renewable, hard_finish_date, request_date=None)¶
Determines the optimal location and time to run a computation using energy data of the selected locations
- codegreen_core.tools.loadshift_location.predict_optimal_location_now(country_list: list, estimated_runtime_hours: int, estimated_runtime_minutes: int, percent_renewable: int, hard_finish_date: datetime) tuple ¶
Given a list of countries, returns the best location where a computation can be run based on the input criteria