tools Module

This subpackage provides tools and methods for tasks like calculating the carbon intensity of energy production and calculating the emissions produced due to a computation.

Each tool is implemented in a separate module and must be imported individually (See below).

Carbon Intensity of Energy

Carbon intensity refers to the amount of greenhouse gases emitted per unit of electricity generated. It is typically measured in grams of CO₂ equivalents per kilowatt-hour (gCO2e/kWh).

Different types of energy production, such as fossil fuels, renewable, and nuclear power, have varying carbon intensity values. Carbon intensity of an energy mix is the weighted sum of the base carbon intensity values of each energy source based on proportion of each source. The carbon intensity of the energy powering a system significantly impacts the overall carbon emissions of computational tasks.

The table below shows the base carbon intensity values of various electricity production sources. These values are adapted from [5]

Type

Average of

mean

coal

Coal—PC

820

natural gas

Gas—Combined Cycle

490

biogas

Biomass—co-firing,Biomass—dedicated

485

geothermal

Geothermal

38

hydropower

Hydropower

24

nuclear

Nuclear

12

solar

Concentrated Solar Power, Solar PV—rooftop,Solar PV—utility

38.6

wind

Wind onshore, Wind offshore

11.5

One challenge with the carbon intensity calculation is that the values can vary depending on the methodology used to make the calculation. Thus, we provide CI values calculated using multiple approaches (essentially different base values). These values are included in the DataFrame as different columns. You can also use your own base values. By default, the IPCC values are used.

When energy generation data is not available for a country, the average values of Carbon Intensity is used. The source of this data is Carbon Footprint Ltd [8]

codegreen_core.tools.carbon_intensity.compute_ci(country: str, start_time: datetime, end_time: datetime) pandas.DataFrame

Computes carbon intensity data for a given country and time period.

If energy data is available, the carbon intensity is calculated from actual energy data for the specified time range. If energy data is not available for the country, a default carbon intensity value is used instead. The default CI values for all countries are stored in utilities/ci_default_values.csv.

codegreen_core.tools.carbon_intensity.compute_ci_from_energy(energy_data: pandas.DataFrame, default_method='ci_ipcc_lifecycle_mean', base_values: dict = None) pandas.DataFrame

Given the energy time series, computes the Carbon intensity for each row. You can choose the base value from several sources available or use your own base values

Parameters:
  • energy_data – The data frame must include the following columns : Coal_per, Petroleum_per, Biomass_per, Natural Gas_per, Geothermal_per, Hydroelectricity_per, Nuclear_per, Solar_per, Wind_per

  • default_method

    This option is to choose the base value of each energy source. By default, IPCC_lifecycle_mean values are used. List of all options:

    • codecarbon (Ref [6])

    • ipcc_lifecycle_min (Ref [5])

    • ipcc_lifecycle_mean (default)

    • ipcc_lifecycle_max

    • eu_comm (Ref [4])

  • base_values – Custom base Carbon Intensity values of energy sources. Must include following keys : Coal, Petroleum, Biomass, Natural Gas, Geothermal, Hydroelectricity, Nuclear, Solar, Wind

Carbon emission of a job

The Methodology for calculating carbon emissions (Based on [7])

Carbon emission of a job depends on 2 factors : Energy consumed by the hardware to run the computation and the emissions generated to produce this energy. The unit used is CO2e or Carbon dioxide equivalent.

  • Carbon Emissions : \(\text{CE} = E \times \text{CI}\) (in \(CO_{2}e\) )

  • Energy consumption : \(E = t \times \left( n_{c} \times P_{c} \times u_{c} + n_{m} \times P_{m} \right) \times PUE \times 0.001\) (in kWh)

    • \(t\) : running time in hours

    • \(n_c\) : the number of core

    • \(n_m\) : the size of memory available (in Gigabytes)

    • \(u_c\) : the core usage factor (between 0 and 1)

    • \(P_c\) : power draw of a computing core (Watt)

    • \(P_m\) : power draw of memory (Watt)

    • \(PUE\) : efficiency coefficient of the data center

  • Emissions related to the production of the energy : represented by the Carbon Intensity of the energy mix during that period. Already implemented above

codegreen_core.tools.carbon_emission.compute_ce(country: str, start_time: datetime, runtime_minutes: int, number_core: int, memory_gb: int, power_draw_core: float = 15.8, usage_factor_core: int = 1, power_draw_mem: float = 0.3725, power_usage_efficiency: float = 1.6)

Calculates the carbon footprint of a job, given its hardware config, time and location of the job. This method returns an hourly time series of the carbon emission. The methodology is defined in the documentation

Parameters:
  • country – The country code where the job was performed (required to fetch energy data)

  • start_time – The starting time of the computation as datetime object in local time zone

  • runtime_minutes – running time in minutes

  • number_core – the number of core

  • memory_gb – the size of memory available (in Gigabytes)

  • power_draw_core – power draw of a computing core (Watt)

  • usage_factor_core – the core usage factor (between 0 and 1)

  • power_draw_mem – power draw of memory (Watt)

  • power_usage_efficiency – efficiency coefficient of the data center

codegreen_core.tools.carbon_emission.compute_ce_from_energy(ci_data: pandas.DataFrame, number_core: int, memory_gb: int, power_draw_core: float = 15.8, usage_factor_core: int = 1, power_draw_mem: float = 0.3725, power_usage_efficiency: float = 1.6)

Calculates the carbon footprint for energy consumption time series This method returns an hourly time series of the carbon emission. The methodology is defined in the documentation

Parameters:
  • ci_data – DataFrame of energy consumption. Required cols : startTimeUTC, ci_default

  • number_core – the number of core

  • memory_gb – the size of memory available (in Gigabytes)

  • power_draw_core – power draw of a computing core (Watt)

  • usage_factor_core – the core usage factor (between 0 and 1)

  • power_draw_mem – power draw of memory (Watt)

  • power_usage_efficiency – efficiency coefficient of the data center

Optimal time shifting

codegreen_core.tools.loadshift_time.predict_now(country: str, estimated_runtime_hours: int, estimated_runtime_minutes: int, hard_finish_date: datetime, criteria: str = 'percent_renewable', percent_renewable: int = 50) tuple

Predicts optimal computation time in the given location starting now

Parameters:
  • country (str) – The country code

  • estimated_runtime_hours (int) – The estimated runtime in hours

  • estimated_runtime_minutes (int) – The estimated runtime in minutes

  • hard_finish_date (datetime) – The latest possible finish time for the task. Datetime object in local time zone

  • criteria (str) – Criteria based on which optimal time is calculated. Valid value “percent_renewable” or “optimal_percent_renewable”

  • percent_renewable (int) – The minimum percentage of renewable energy desired during the runtime

Returns:

Tuple[timestamp, message, average_percent_renewable]

Return type:

tuple

codegreen_core.tools.loadshift_time.predict_optimal_time(energy_data: pandas.DataFrame, estimated_runtime_hours: int, estimated_runtime_minutes: int, percent_renewable: int, hard_finish_date: datetime, request_time: datetime = None) tuple

Predicts the optimal time window to run a task based in energy data, run time estimates and renewable energy target.

Parameters:
  • energy_data – A DataFrame containing the energy data including startTimeUTC, totalRenewable,total,percent_renewable,posix_timestamp

  • estimated_runtime_hours – The estimated runtime in hours

  • estimated_runtime_minutes – The estimated runtime in minutes

  • percent_renewable – The minimum percentage of renewable energy desired during the runtime

  • hard_finish_date – The latest possible finish time for the task.

  • request_time – The time at which the prediction is requested. Defaults to None, then the current time is used. Assumed to be in local timezone

Returns:

Tuple[timestamp, message, average_percent_renewable]

Return type:

tuple

Optimal Location shifting

codegreen_core.tools.loadshift_location.predict_optimal_location(forecast_data, estimated_runtime_hours, estimated_runtime_minutes, percent_renewable, hard_finish_date, request_date=None)

Determines the optimal location and time to run a computation using energy data of the selected locations

codegreen_core.tools.loadshift_location.predict_optimal_location_now(country_list: list, estimated_runtime_hours: int, estimated_runtime_minutes: int, percent_renewable: int, hard_finish_date: datetime) tuple

Given a list of countries, returns the best location where a computation can be run based on the input criteria