tools Module

This subpackage provides tools and methods for tasks like calculating the carbon intensity of energy production and calculating the emissions produced due to a computation.

Each tool is implemented in a separate module and must be imported individually (See below).

Carbon Intensity of Energy

Carbon intensity refers to the amount of greenhouse gases emitted per unit of electricity generated. It is typically measured in grams of CO₂ equivalents per kilowatt-hour (gCO2e/kWh).

Different types of energy production, such as fossil fuels, renewable, and nuclear power, have varying carbon intensity values. Carbon intensity of an energy mix is the weighted sum of the base carbon intensity values of each energy source based on proportion of each source. The carbon intensity of the energy powering a system significantly impacts the overall carbon emissions of computational tasks.

The table below shows the base carbon intensity values of various electricity production sources. These values are adapted from [5]

Type

Average of

mean

coal

Coal—PC

820

natural gas

Gas—Combined Cycle

490

biogas

Biomass—co-firing,Biomass—dedicated

485

geothermal

Geothermal

38

hydropower

Hydropower

24

nuclear

Nuclear

12

solar

Concentrated Solar Power, Solar PV—rooftop,Solar PV—utility

38.6

wind

Wind onshore, Wind offshore

11.5

One challenge with the carbon intensity calculation is that the values can vary depending on the methodology used to make the calculation. Thus, we provide CI values calculated using multiple approaches (essentially different base values). These values are included in the DataFrame as different columns. You can also use your own base values. By default, the IPCC values are used.

When energy generation data is not available for a country, the average values of Carbon Intensity is used. The source of this data is Carbon Footprint Ltd [8]

codegreen_core.tools.carbon_intensity.compute_ci(country: str, start_time: datetime, end_time: datetime) pandas.DataFrame

Computes carbon intensity data for a given country and time period.

If energy data is available, the carbon intensity is calculated from actual energy data for the specified time range. If energy data is not available for the country, a default carbon intensity value is used instead. The default CI values for all countries are stored in utilities/ci_default_values.csv.

codegreen_core.tools.carbon_intensity.compute_ci_from_energy(energy_data: pandas.DataFrame, default_method='ci_ipcc_lifecycle_mean', base_values: dict = None) pandas.DataFrame

Given the energy time series, computes the carbon intensity for each row. You can choose the base value from several sources available or use your own base values.

Parameters:
  • energy_data

    A pandas DataFrame that must include the following columns, representing

    the percentage of energy generated from each source:

    • Coal_per (float): Percentage of energy generated from coal.

    • Petroleum_per (float): Percentage of energy generated from petroleum.

    • Biomass_per (float): Percentage of energy generated from biomass.

    • Natural Gas_per (float): Percentage of energy generated from natural gas.

    • Geothermal_per (float): Percentage of energy generated from geothermal sources.

    • Hydroelectricity_per (float): Percentage of energy generated from hydroelectric sources.

    • Nuclear_per (float): Percentage of energy generated from nuclear sources.

    • Solar_per (float): Percentage of energy generated from solar sources.

    • Wind_per (float): Percentage of energy generated from wind sources.

  • default_method

    This parameter allows you to choose the base values for each energy source.

    By default, the IPCC lifecycle mean values are used. Available options include:

    • codecarbon (Ref [6])

    • ipcc_lifecycle_min (Ref [5])

    • ipcc_lifecycle_mean (default)

    • ipcc_lifecycle_max

    • eu_comm (Ref [4])

  • base_values(optional)

    A dictionary of custom base carbon intensity values for energy sources.

    Must include the following keys:

    • Coal (float): Base carbon intensity value for coal.

    • Petroleum (float): Base carbon intensity value for petroleum.

    • Biomass (float): Base carbon intensity value for biomass.

    • Natural Gas (float): Base carbon intensity value for natural gas.

    • Geothermal (float): Base carbon intensity value for geothermal energy.

    • Hydroelectricity (float): Base carbon intensity value for hydroelectricity.

    • Nuclear (float): Base carbon intensity value for nuclear energy.

    • Solar (float): Base carbon intensity value for solar energy.

    • Wind (float): Base carbon intensity value for wind energy.

Carbon emission of a job

The Methodology for calculating carbon emissions (Based on [7])

Carbon emission of a job depends on 2 factors : Energy consumed by the hardware to run the computation and the emissions generated to produce this energy. The unit used is CO2e or Carbon dioxide equivalent.

  • Carbon Emissions : \(\text{CE} = E \times \text{CI}\) (in \(CO_{2}e\) )

  • Energy consumption : \(E = t \times \left( n_{c} \times P_{c} \times u_{c} + n_{m} \times P_{m} \right) \times PUE \times 0.001\) (in kWh)

    • \(t\) : running time in hours

    • \(n_c\) : the number of core

    • \(n_m\) : the size of memory available (in Gigabytes)

    • \(u_c\) : the core usage factor (between 0 and 1)

    • \(P_c\) : power draw of a computing core (Watt)

    • \(P_m\) : power draw of memory (Watt)

    • \(PUE\) : efficiency coefficient of the data center

  • Emissions related to the production of the energy : represented by the Carbon Intensity of the energy mix during that period. Already implemented above

  • The result is Carbon emission in CO2e

codegreen_core.tools.carbon_emission.compare_carbon_emissions(server1, server2, start_time1, start_time2, runtime_minutes)

Compares the carbon emissions of running a job with the same duration on two different servers.

Parameters:
  • server1

    A dictionary containing the details of the first server’s hardware and location specifications. Required keys include:

    • country (str): The country code for the server’s location (used for energy data).

    • number_core (int): The number of CPU cores.

    • memory_gb (float): The memory available in Gigabytes.

    • power_draw_core (float): Power draw of each computing core in Watts.

    • usage_factor_core (float): The core usage factor, a value between 0 and 1.

    • power_draw_mem (float): Power draw of memory in Watts.

    • power_usage_efficiency (float): Efficiency coefficient of the data center.

  • server2

    A dictionary containing the details of the second server’s hardware and location specifications. Required keys are identical to those in server1:

    • country (str): The country code for the server’s location.

    • number_core (int): The number of CPU cores.

    • memory_gb (float): The memory available in Gigabytes.

    • power_draw_core (float): Power draw of each computing core in Watts.

    • usage_factor_core (float): The core usage factor, a value between 0 and 1.

    • power_draw_mem (float): Power draw of memory in Watts.

    • power_usage_efficiency (float): Efficiency coefficient of the data center.

  • start_time1 – The start time of the job on server1 (datetime).

  • start_time2 – The start time of the job on server2 (datetime).

  • runtime_minutes – The total running time of the job in minutes (int).

Returns:

A dictionary with the carbon emissions for each server and the percentage difference, structured as follows: - emissions_server1 (float): Total carbon emissions for server1 in kilograms of CO2 equivalent. - emissions_server2 (float): Total carbon emissions for server2 in kilograms of CO2 equivalent. - absolute_difference (float): The absolute difference in emissions between the two servers. - higher_emission_server (str): Indicates which server has higher emissions (“server1” or “server2”).

codegreen_core.tools.carbon_emission.compute_ce(server: dict, start_time: datetime, runtime_minutes: int) tuple[float, pandas.DataFrame]

Calculates the carbon footprint of a job, given its hardware configuration, time, and location. This method returns an hourly time series of the carbon emissions.

The methodology is defined in the documentation.

Parameters:
  • server

    A dictionary containing the details about the server, including its hardware specifications. The dictionary should include the following keys:

    • country (str): The country code where the job was performed (required to fetch energy data).

    • number_core (int): The number of CPU cores.

    • memory_gb (float): The size of memory available in Gigabytes.

    • power_draw_core (float): Power draw of a computing core in Watts.

    • usage_factor_core (float): The core usage factor, a value between 0 and 1.

    • power_draw_mem (float): Power draw of memory in Watts.

    • power_usage_efficiency (float): Efficiency coefficient of the data center.

  • start_time – The start time of the job (datetime).

  • runtime_minutes – Total running time of the job in minutes (int).

Returns:

A tuple containing: - (float): The total carbon footprint of the job in kilograms of CO2 equivalent. - (pandas.DataFrame): A DataFrame containing the hourly time series of carbon emissions.

codegreen_core.tools.carbon_emission.compute_ce_from_energy(server, ci_data: pandas.DataFrame)

Calculates the carbon footprint for energy consumption over a time series. This method returns an hourly time series of the carbon emissions.

The methodology is defined in the documentation. Note that the start and end times for the computation are derived from the first and last rows of the ci_data DataFrame.

Parameters:
  • server

    A dictionary containing details about the server, including its hardware specifications. The dictionary should include:

    • number_core (int): The number of CPU cores.

    • memory_gb (float): The size of memory available in Gigabytes.

    • power_draw_core (float): Power draw of a computing core in Watts.

    • usage_factor_core (float): The core usage factor, a value between 0 and 1.

    • power_draw_mem (float): Power draw of memory in Watts.

    • power_usage_efficiency (float): Efficiency coefficient of the data center.

  • ci_data

    A pandas DataFrame of energy consumption over time. The DataFrame should include the following columns:

    • startTimeUTC (datetime): The start time of each energy measurement in UTC.

    • ci_default (float): Carbon intensity values for the energy consumption.

Returns:

A tuple containing: - (float): The total carbon footprint of the job in kilograms of CO2 equivalent. - (pandas.DataFrame): A DataFrame containing the hourly time series of carbon emissions.

Optimal time shifting

codegreen_core.tools.loadshift_time.predict_now(country: str, estimated_runtime_hours: int, estimated_runtime_minutes: int, hard_finish_date: datetime, criteria: str = 'percent_renewable') tuple

Predicts optimal computation time in the given location starting now

Parameters:
  • country (str) – The country code

  • estimated_runtime_hours (int) – The estimated runtime in hours

  • estimated_runtime_minutes (int) – The estimated runtime in minutes

  • hard_finish_date (datetime) – The latest possible finish time for the task. Datetime object in local time zone

  • criteria (str) – Criteria based on which optimal time is calculated. Valid value “percent_renewable” or “optimal_percent_renewable”

Returns:

Tuple[timestamp, message, average_percent_renewable]

Return type:

tuple

codegreen_core.tools.loadshift_time.predict_optimal_time(energy_data: pandas.DataFrame, estimated_runtime_hours: int, estimated_runtime_minutes: int, hard_finish_date: datetime, request_time: datetime = None) tuple

Predicts the optimal time window to run a task based in energy data, run time estimates and renewable energy target.

Parameters:
  • energy_data – A DataFrame containing the energy data including startTimeUTC, totalRenewable,total,percent_renewable,posix_timestamp

  • estimated_runtime_hours – The estimated runtime in hours

  • estimated_runtime_minutes – The estimated runtime in minutes

  • hard_finish_date – The latest possible finish time for the task.

  • request_time – The time at which the prediction is requested. Defaults to None, then the current time is used. Assumed to be in local timezone

Returns:

Tuple[timestamp, message, average_percent_renewable]

Return type:

tuple

Optimal Location shifting

codegreen_core.tools.loadshift_location.predict_optimal_location(forecast_data, estimated_runtime_hours, estimated_runtime_minutes, percent_renewable, hard_finish_date, request_date=None)

Determines the optimal location and time to run a computation using energy data of the selected locations

codegreen_core.tools.loadshift_location.predict_optimal_location_now(country_list: list, estimated_runtime_hours: int, estimated_runtime_minutes: int, percent_renewable: int, hard_finish_date: datetime) tuple

Given a list of countries, returns the best location where a computation can be run based on the input criteria