tools
Module¶
This subpackage provides tools and methods for tasks like calculating the carbon intensity of energy production and calculating the emissions produced due to a computation.
Each tool is implemented in a separate module and must be imported individually (See below).
Carbon Intensity of Energy¶
Carbon intensity refers to the amount of greenhouse gases emitted per unit of electricity generated. It is typically measured in grams of CO₂ equivalents per kilowatt-hour (gCO2e/kWh).
Different types of energy production, such as fossil fuels, renewable, and nuclear power, have varying carbon intensity values. Carbon intensity of an energy mix is the weighted sum of the base carbon intensity values of each energy source based on proportion of each source. The carbon intensity of the energy powering a system significantly impacts the overall carbon emissions of computational tasks.
The table below shows the base carbon intensity values of various electricity production sources. These values are adapted from [5]
Type
Average of
mean
coal
Coal—PC
820
natural gas
Gas—Combined Cycle
490
biogas
Biomass—co-firing,Biomass—dedicated
485
geothermal
Geothermal
38
hydropower
Hydropower
24
nuclear
Nuclear
12
solar
Concentrated Solar Power, Solar PV—rooftop,Solar PV—utility
38.6
wind
Wind onshore, Wind offshore
11.5
One challenge with the carbon intensity calculation is that the values can vary depending on the methodology used to make the calculation. Thus, we provide CI values calculated using multiple approaches (essentially different base values). These values are included in the DataFrame as different columns. You can also use your own base values. By default, the IPCC values are used.
When energy generation data is not available for a country, the average values of Carbon Intensity is used. The source of this data is Carbon Footprint Ltd [8]
- codegreen_core.tools.carbon_intensity.compute_ci(country: str, start_time: datetime, end_time: datetime) pandas.DataFrame ¶
Computes carbon intensity data for a given country and time period.
If energy data is available, the carbon intensity is calculated from actual energy data for the specified time range. If energy data is not available for the country, a default carbon intensity value is used instead. The default CI values for all countries are stored in utilities/ci_default_values.csv.
- codegreen_core.tools.carbon_intensity.compute_ci_from_energy(energy_data: pandas.DataFrame, default_method='ci_ipcc_lifecycle_mean', base_values: dict = None) pandas.DataFrame ¶
Given the energy time series, computes the Carbon intensity for each row. You can choose the base value from several sources available or use your own base values
- Parameters:
energy_data – The data frame must include the following columns : Coal_per, Petroleum_per, Biomass_per, Natural Gas_per, Geothermal_per, Hydroelectricity_per, Nuclear_per, Solar_per, Wind_per
default_method –
This option is to choose the base value of each energy source. By default, IPCC_lifecycle_mean values are used. List of all options:
codecarbon (Ref [6])
ipcc_lifecycle_min (Ref [5])
ipcc_lifecycle_mean (default)
ipcc_lifecycle_max
eu_comm (Ref [4])
base_values – Custom base Carbon Intensity values of energy sources. Must include following keys : Coal, Petroleum, Biomass, Natural Gas, Geothermal, Hydroelectricity, Nuclear, Solar, Wind
Carbon emission of a job¶
The Methodology for calculating carbon emissions (Based on [7])
Carbon emission of a job depends on 2 factors : Energy consumed by the hardware to run the computation and the emissions generated to produce this energy. The unit used is CO2e or Carbon dioxide equivalent.
Carbon Emissions : \(\text{CE} = E \times \text{CI}\) (in \(CO_{2}e\) )
Energy consumption : \(E = t \times \left( n_{c} \times P_{c} \times u_{c} + n_{m} \times P_{m} \right) \times PUE \times 0.001\) (in kWh)
\(t\) : running time in hours
\(n_c\) : the number of core
\(n_m\) : the size of memory available (in Gigabytes)
\(u_c\) : the core usage factor (between 0 and 1)
\(P_c\) : power draw of a computing core (Watt)
\(P_m\) : power draw of memory (Watt)
\(PUE\) : efficiency coefficient of the data center
Emissions related to the production of the energy : represented by the Carbon Intensity of the energy mix during that period. Already implemented above
- codegreen_core.tools.carbon_emission.compute_ce(country: str, start_time: datetime, runtime_minutes: int, number_core: int, memory_gb: int, power_draw_core: float = 15.8, usage_factor_core: int = 1, power_draw_mem: float = 0.3725, power_usage_efficiency: float = 1.6)¶
Calculates the carbon footprint of a job, given its hardware config, time and location of the job. This method returns an hourly time series of the carbon emission. The methodology is defined in the documentation
- Parameters:
country – The country code where the job was performed (required to fetch energy data)
start_time – The starting time of the computation as datetime object in local time zone
runtime_minutes – running time in minutes
number_core – the number of core
memory_gb – the size of memory available (in Gigabytes)
power_draw_core – power draw of a computing core (Watt)
usage_factor_core – the core usage factor (between 0 and 1)
power_draw_mem – power draw of memory (Watt)
power_usage_efficiency – efficiency coefficient of the data center
- codegreen_core.tools.carbon_emission.compute_ce_from_energy(ci_data: pandas.DataFrame, number_core: int, memory_gb: int, power_draw_core: float = 15.8, usage_factor_core: int = 1, power_draw_mem: float = 0.3725, power_usage_efficiency: float = 1.6)¶
Calculates the carbon footprint for energy consumption time series This method returns an hourly time series of the carbon emission. The methodology is defined in the documentation
- Parameters:
ci_data – DataFrame of energy consumption. Required cols : startTimeUTC, ci_default
number_core – the number of core
memory_gb – the size of memory available (in Gigabytes)
power_draw_core – power draw of a computing core (Watt)
usage_factor_core – the core usage factor (between 0 and 1)
power_draw_mem – power draw of memory (Watt)
power_usage_efficiency – efficiency coefficient of the data center
Optimal time shifting¶
- codegreen_core.tools.loadshift_time.predict_now(country: str, estimated_runtime_hours: int, estimated_runtime_minutes: int, hard_finish_date: datetime, criteria: str = 'percent_renewable', percent_renewable: int = 50) tuple ¶
Predicts optimal computation time in the given location starting now
- Parameters:
country (str) – The country code
estimated_runtime_hours (int) – The estimated runtime in hours
estimated_runtime_minutes (int) – The estimated runtime in minutes
hard_finish_date (datetime) – The latest possible finish time for the task. Datetime object in local time zone
criteria (str) – Criteria based on which optimal time is calculated. Valid value “percent_renewable” or “optimal_percent_renewable”
percent_renewable (int) – The minimum percentage of renewable energy desired during the runtime
- Returns:
Tuple[timestamp, message, average_percent_renewable]
- Return type:
tuple
- codegreen_core.tools.loadshift_time.predict_optimal_time(energy_data: pandas.DataFrame, estimated_runtime_hours: int, estimated_runtime_minutes: int, percent_renewable: int, hard_finish_date: datetime, request_time: datetime = None) tuple ¶
Predicts the optimal time window to run a task based in energy data, run time estimates and renewable energy target.
- Parameters:
energy_data – A DataFrame containing the energy data including startTimeUTC, totalRenewable,total,percent_renewable,posix_timestamp
estimated_runtime_hours – The estimated runtime in hours
estimated_runtime_minutes – The estimated runtime in minutes
percent_renewable – The minimum percentage of renewable energy desired during the runtime
hard_finish_date – The latest possible finish time for the task.
request_time – The time at which the prediction is requested. Defaults to None, then the current time is used. Assumed to be in local timezone
- Returns:
Tuple[timestamp, message, average_percent_renewable]
- Return type:
tuple
Optimal Location shifting¶
- codegreen_core.tools.loadshift_location.predict_optimal_location(forecast_data, estimated_runtime_hours, estimated_runtime_minutes, percent_renewable, hard_finish_date, request_date=None)¶
Determines the optimal location and time to run a computation using energy data of the selected locations
- codegreen_core.tools.loadshift_location.predict_optimal_location_now(country_list: list, estimated_runtime_hours: int, estimated_runtime_minutes: int, percent_renewable: int, hard_finish_date: datetime) tuple ¶
Given a list of countries, returns the best location where a computation can be run based on the input criteria