codegreen_core API

The core package contains 2 main module :

  • data : To fetch energy data for a country.

  • tools : To calculate various quantities like Optimal computation time, carbon intensity etc.

data module

codegreen_core.data.energy(country, start_time, end_time, type='generation') dict

Returns an hourly time series of the energy production mix for a specified country and time range, if a valid energy data source is available.

The data is returned as a pandas DataFrame along with additional metadata. The columns vary depending on the data source. For example, if the source is ENTSOE, the data includes fields such as “Biomass”, “Geothermal”, “Hydro Pumped Storage”, “Hydro Run-of-river and Poundage”, “Hydro Water Reservoir”, etc.

However, some fields remain consistent across data sources:

Column

Type

Description

startTimeUTC

object

Start time in UTC (format: YYYYMMDDhhmm)

startTime

datetime

Start time in local timezone

renewableTotal

float64

The total production from all renewable sources

renewableTotalWS

float64

Total production using only Wind and Solar energy sources

nonRenewableTotal

float64

Total production from non-renewable sources

total

float64

Total energy production from all sources

percentRenewable

int64

Percentage of total energy from renewable sources

percentRenewableWS

int64

Percentage of energy from Wind and Solar only

Wind_per

int64

Percentage contribution from Wind energy

Solar_per

int64

Percentage contribution from Solar energy

Nuclear_per

int64

Percentage contribution from Nuclear energy

Hydroelectricity_per

int64

Percentage contribution from Hydroelectricity

Geothermal_per

int64

Percentage contribution from Geothermal energy

Natural Gas_per

int64

Percentage contribution from Natural Gas

Petroleum_per

int64

Percentage contribution from Petroleum

Coal_per

int64

Percentage contribution from Coal

Biomass_per

int64

Percentage contribution from Biomass

Parameters:
  • country (str) – The 2-letter country code (e.g., “DE” for Germany, “FR” for France, etc.).

  • start_time (datetime) – The start date for data retrieval (rounded to the nearest hour).

  • end_time (datetime) – The end date for data retrieval (rounded to the nearest hour).

  • type (str) – The type of data to retrieve; either ‘generation’ or ‘forecast’. Defaults to ‘generation’.

Returns:

A dictionary containing the following keys:

  • error (str): An error message, empty if no errors occurred.

  • data_available (bool): Indicates whether data was successfully retrieved.

  • data (pandas.DataFrame): The retrieved energy data if available; an empty DataFrame otherwise.

  • time_interval (int): The time interval of the DataFrame (constant value: 60).

  • source (str): Specifies the origin of the retrieved data. Defaults to 'public_data', indicating it was fetched from an external source. If the offline storage feature is enabled, this value may change if the data is available locally.

  • columns : a dict of columns for renewable and non renewable energy sources in the data

Return type:

dict

Example Usage:

Get generation data for Germany

from datetime import datetime
from codegreen_core.data import energy
result = energy(country="DE", start_time=datetime(2025, 1, 1), end_time=datetime(2025, 1, 2), type="generation")

Get forecast data for Norway

from datetime import datetime
from codegreen_core.data import energy
result = energy(country="NO", start_time=datetime(2025, 1, 1), end_time=datetime(2025, 1, 2), type="forecast")
codegreen_core.data.get_offline_data(country, start, end, sync_first=False) dict

This method returns locally stored energy data.

Data is stored in two sources:

  1. Redis cache: Contains data for a limited number of hours from the last sync.

  2. CSV files: Contain data for longer durations.

Both storage options can be configured in the configuration file.

Note: Unless you specify the sync_first flag, the method assumes that syncing of the data sources is handled separately. If sync_first is set to True and data files are not initialized in advance, the method may take longer to complete

Returns:

A dictionary with the following keys: - available (bool): Indicates if the data is available. - data (pandas.DataFrame): The energy data, if available. Otherwise, an empty DataFrame.

Return type:

dict

codegreen_core.data.info() list

Returns a list of countries (in two-letter codes) and energy sources for which data can be fetched using the package.

Returns:

A list of dictionary containing:

  • name of the country

  • energy_source : the publicly available energy data source

  • carbon_intensity_method : the methodology used to calculate carbon intensity

  • code : the 2 letter country code

Return type:

list

codegreen_core.data.sync_offline_data(file=False, cache=False)

This method syncs offline data for offline sources enabled in the configuration file. The data is synced for all available countries.

You need to run this method before retrieving offline data. It is also possible to set up a CRON job to call this method at regular intervals to keep data synchronized.

The sync operation can take some time, depending on the data size and the selected sync options (file, cache, or both).

Parameters:
  • file (bool) – If True, sync data in offline files. Defaults to False.

  • cache (bool) – If True, sync data in the cache. Defaults to False.

tools module

Methods vary depending on the type of input (e.g, country name vs energy data) and the output (e.g single value vs time series DataFrame). Most tools depend on the data from the data sub package. As a convention, methods that primarily accept DataFrame as an input (along with other parameters) and return a DataFrame are prefixed with _df.

codegreen_core.tools.carbon_intensity.compute_ci(country: str, start_time: datetime, end_time: datetime) pandas.DataFrame

Computes the carbon intensity (CI) for a given country and time period.

This function determines the energy data source for the country. - If energy data is available (e.g., from ENTSOE), it calculates CI using actual energy data. - If energy data is unavailable, it uses default CI values from ci_default_values.csv for the country.

Parameters:
  • country (str) – The 2 letter country code.

  • start_time (datetime) – The start of the time range for which CI is computed.

  • end_time (datetime) – The end of the time range for which CI is computed.

Returns:

A pandas DataFrame containing timestamps (startTimeUTC) and corresponding carbon intensity values.

Return type:

pd.DataFrame

codegreen_core.tools.carbon_intensity.compute_ci_from_energy(energy_data: pandas.DataFrame, default_method='ci_ipcc_lifecycle_mean', base_values: dict = None) pandas.DataFrame

Given the energy time series, computes the carbon intensity for each row. You can choose the base value from several sources available or use your own base values.

Parameters:
  • energy_data

    A pandas DataFrame that must include the following columns, representing

    the percentage of energy generated from each source:

    • Coal_per (float): Percentage of energy generated from coal.

    • Petroleum_per (float): Percentage of energy generated from petroleum.

    • Biomass_per (float): Percentage of energy generated from biomass.

    • Natural Gas_per (float): Percentage of energy generated from natural gas.

    • Geothermal_per (float): Percentage of energy generated from geothermal sources.

    • Hydroelectricity_per (float): Percentage of energy generated from hydroelectric sources.

    • Nuclear_per (float): Percentage of energy generated from nuclear sources.

    • Solar_per (float): Percentage of energy generated from solar sources.

    • Wind_per (float): Percentage of energy generated from wind sources.

  • default_method

    This parameter allows you to choose the base values for each energy source.

    By default, the IPCC lifecycle mean values are used. Available options include:

    • codecarbon (Ref [6])

    • ipcc_lifecycle_min (Ref [5])

    • ipcc_lifecycle_mean (default)

    • ipcc_lifecycle_max

    • eu_comm (Ref [4])

  • base_values(optional)

    A dictionary of custom base carbon intensity values for energy sources.

    Must include the following keys:

    • Coal (float): Base carbon intensity value for coal.

    • Petroleum (float): Base carbon intensity value for petroleum.

    • Biomass (float): Base carbon intensity value for biomass.

    • Natural Gas (float): Base carbon intensity value for natural gas.

    • Geothermal (float): Base carbon intensity value for geothermal energy.

    • Hydroelectricity (float): Base carbon intensity value for hydroelectricity.

    • Nuclear (float): Base carbon intensity value for nuclear energy.

    • Solar (float): Base carbon intensity value for solar energy.

    • Wind (float): Base carbon intensity value for wind energy.

codegreen_core.tools.carbon_emission.compare_carbon_emissions(server1, server2, start_time1, start_time2, runtime_minutes)

Compares the carbon emissions of running a job with the same duration on two different servers.

Parameters:
  • server1

    A dictionary containing the details of the first server’s hardware and location specifications. Required keys include:

    • country (str): The country code for the server’s location (used for energy data).

    • number_core (int): The number of CPU cores.

    • memory_gb (float): The memory available in Gigabytes.

    • power_draw_core (float): Power draw of each computing core in Watts.

    • usage_factor_core (float): The core usage factor, a value between 0 and 1.

    • power_draw_mem (float): Power draw of memory in Watts.

    • power_usage_efficiency (float): Efficiency coefficient of the data center.

  • server2

    A dictionary containing the details of the second server’s hardware and location specifications. Required keys are identical to those in server1:

    • country (str): The country code for the server’s location.

    • number_core (int): The number of CPU cores.

    • memory_gb (float): The memory available in Gigabytes.

    • power_draw_core (float): Power draw of each computing core in Watts.

    • usage_factor_core (float): The core usage factor, a value between 0 and 1.

    • power_draw_mem (float): Power draw of memory in Watts.

    • power_usage_efficiency (float): Efficiency coefficient of the data center.

  • start_time1 – The start time of the job on server1 (datetime).

  • start_time2 – The start time of the job on server2 (datetime).

  • runtime_minutes – The total running time of the job in minutes (int).

Returns:

A dictionary with the carbon emissions for each server and the percentage difference, structured as follows: - emissions_server1 (float): Total carbon emissions for server1 in kilograms of CO2 equivalent. - emissions_server2 (float): Total carbon emissions for server2 in kilograms of CO2 equivalent. - absolute_difference (float): The absolute difference in emissions between the two servers. - higher_emission_server (str): Indicates which server has higher emissions (“server1” or “server2”).

codegreen_core.tools.carbon_emission.compute_ce(server: dict, start_time: datetime, runtime_minutes: int) tuple[float, pandas.DataFrame]

Calculates the carbon footprint of a job, given the server details , start time and runtime. This method returns an hourly time series of the carbon emissions. The methodology is defined in the documentation.

Parameters:
  • server

    A dictionary containing the details about the server, including its hardware specifications. The dictionary should include the following keys:

    • country (str): The country code where the job was performed (required to fetch energy data).

    • number_core (int): The number of CPU cores.

    • memory_gb (float): The size of memory available in Gigabytes.

    • power_draw_core (float): Power draw of a computing core in Watts.

    • usage_factor_core (float): The core usage factor, a value between 0 and 1.

    • power_draw_mem (float): Power draw of memory in Watts.

    • power_usage_efficiency (float): Efficiency coefficient of the data center.

  • start_time – The start time of the job (datetime).

  • runtime_minutes – Total running time of the job in minutes (int).

Returns:

A tuple containing: - (float): The total carbon footprint of the job in kilograms of CO2 equivalent. - (pandas.DataFrame): A DataFrame containing the hourly time series of carbon emissions.

codegreen_core.tools.carbon_emission.compute_ce_from_energy(server, ci_data: pandas.DataFrame)

Calculates the carbon footprint for energy consumption over a time series. This method returns an hourly time series of the carbon emissions.

The methodology is defined in the documentation. Note that the start and end times for the computation are derived from the first and last rows of the ci_data DataFrame.

Parameters:
  • server

    A dictionary containing details about the server, including its hardware specifications. The dictionary should include:

    • number_core (int): The number of CPU cores.

    • memory_gb (float): The size of memory available in Gigabytes.

    • power_draw_core (float): Power draw of a computing core in Watts.

    • usage_factor_core (float): The core usage factor, a value between 0 and 1.

    • power_draw_mem (float): Power draw of memory in Watts.

    • power_usage_efficiency (float): Efficiency coefficient of the data center.

  • ci_data

    A pandas DataFrame of energy consumption over time. The DataFrame should include the following columns:

    • startTimeUTC (datetime): The start time of each energy measurement in UTC.

    • ci_default (float): Carbon intensity values for the energy consumption.

Returns:

A tuple containing: - (float): The total carbon footprint of the job in kilograms of CO2 equivalent. - (pandas.DataFrame): A DataFrame containing the hourly time series of carbon emissions.

codegreen_core.tools.loadshift_time.predict_now(country: str, estimated_runtime_hours: int, estimated_runtime_minutes: int, hard_finish_date: datetime, criteria: str = 'percent_renewable') tuple

Predicts optimal computation time in the given location starting now

Parameters:
  • country (str) – The country code

  • estimated_runtime_hours (int) – The estimated runtime in hours

  • estimated_runtime_minutes (int) – The estimated runtime in minutes

  • hard_finish_date (datetime) – The latest possible finish time for the task. Datetime object in local time zone

  • criteria (str) – Criteria based on which optimal time is calculated. Valid value “percent_renewable” or “optimal_percent_renewable”

Returns:

Tuple[timestamp, message, average_percent_renewable]

Return type:

tuple

Example usage:

from datetime import datetime,timedelta 
from codegreen_core.tools.loadshift_time import predict_now

country_code = "DK"
est_runtime_hour = 10
est_runtime_min = 0
now = datetime.now()
hard_finish_date = now + timedelta(days=1)
criteria = "percent_renewable"
per_renewable = 50 

time = predict_now(country_code,
                    est_runtime_hour,
                    est_runtime_min,
                    hard_finish_date,
                    criteria,
                    per_renewable)
# (1728640800.0, <Message.OPTIMAL_TIME: 'OPTIMAL_TIME'>, 76.9090909090909)
codegreen_core.tools.loadshift_time.predict_optimal_time(energy_data: pandas.DataFrame, estimated_runtime_hours: int, estimated_runtime_minutes: int, hard_finish_date: datetime, request_time: datetime = None) tuple

Predicts the optimal time window to run a task within the given energy data time frame the run time estimate .

Parameters:
  • energy_data – A DataFrame containing the energy data including startTimeUTC, totalRenewable,total,percent_renewable,posix_timestamp

  • estimated_runtime_hours – The estimated runtime in hours

  • estimated_runtime_minutes – The estimated runtime in minutes

  • hard_finish_date – The latest possible finish time for the task.

  • request_time – The time at which the prediction is requested. Defaults to None, then the current time is used. Assumed to be in local timezone

Returns:

Tuple[timestamp, message, average_percent_renewable]

Return type:

tuple