This document defines key terms and concepts used throughout the DEgym framework.
The main interface between RL agents and reactor simulations. Inherits from Gymnasium’s Env class and provides the standard RL interface (step(), reset(), etc.). The Environment coordinates all other components and maintains the internal simulation state.
Usage in DEgym: Acts as the central orchestrator that manages the complete simulation lifecycle, from initialization through episode execution.
A complete simulation run from initialization to termination/truncation. Each episode represents one training or evaluation trajectory for the RL agent.
Usage in DEgym: Episodes are bounded by reset() and end when terminated or truncated conditions are met.
The complete internal representation of the environment at any given time. Contains all information needed to fully describe the reactor system and continue simulation from that point.
Components:
dae_state: Variables that appear on the left-hand side of differential equations.dae_params: Parameters used in the differential equations.non_dae_params: Parameters not part of the differential equations but needed for simulation.Usage in DEgym: The State is passed between components throughout the simulation pipeline and gets updated at each timestep.
State is NOT!State is a common term used in different disciplines where it points to somewhat different concepts. The state here is not:
State this way?We split the state values into these components so that we can easily retrieve and separate the state values required for implementing a reactor. This choice is motivated by the structure of the DAE and the way we implement the dynamics function.
Variables that appear on the left-hand side of differential equations (the variables being differentiated).
Examples: Concentrations (c_A, c_B), temperature (T), pressure (P).
Usage in DEgym: Updated by numerical integration and represents the evolving state of the system.
Parameters that appear in the differential equations but are not differentiated variables.
Examples: Flow rates (F), reactor volume (V), kinetic constants (k_0_a, k_0_b), activation energies (E_a), etc.
Usage in DEgym: Used by the integrator in differential equation calculations.
Parameters needed for simulation but not part of the differential equations.
Examples: Maximum heat input (q_max), episode length (max_timestep), current timestep, constraint limits.
Usage in DEgym: Used for simulation control, constraints, and episode management.
High-level parameters that characterize a specific reactor configuration. Used to generate both DAEParameters and NonDAEParameters.
[!NOTE]
PhysicalParameteris not the config that is passed tomake_cstr_environment.
Examples: Reactor geometry, parameters of operating constraints, reaction kinetics, feed composition.
Usage in DEgym: Generated at episode start and remain constant within an episode, but can vary between episodes for domain randomization.
The direct input passed to the environment’s step() function by the RL agent. Can be scalars, numpy arrays, or other data types as defined by the action space.
Usage in DEgym: Raw actions are the starting point of the action processing pipeline.
An intermediate representation that gives semantic meaning to raw actions. Provides named fields and interpretable structure before conversion to physical control parameters.
Purpose:
Usage in DEgym: Actions are created by wrapping raw inputs and then converted to DAEActions.
Control parameters as they appear directly in the differential equation system. These are domain-specific, physically meaningful control inputs that affect system dynamics.
Purpose:
Usage in DEgym: DAEActions are passed to the integrator for numerical solving.
Transforms semantic Actions into DAEActions and vice versa. Handles scaling, unit conversion, and complex control logic transformations.
Usage in DEgym: Called during action preprocessing to bridge semantic and physical representations.
Enforces constraints on DAEActions to ensure they remain within safe and feasible bounds.
Methods:
is_legal(): Checks if an action satisfies all constraints.convert_to_legal_action(): Corrects constraint-violating actions.Usage in DEgym: Applied after action conversion to ensure safe operation.
Orchestrates the complete action processing pipeline from raw inputs to validated DAEActions.
Pipeline:
Usage in DEgym: Called by the Environment during each step() to process agent actions.
Coordinates preprocessing of both state and action before numerical integration.
Usage in DEgym: Called at the beginning of each step() to prepare inputs for the integrator.
Processes the state before numerical integration. Can apply transformations, corrections, or preparation steps.
Usage in DEgym: Ensures the state is in the correct format and range for numerical solving.
Processes the state after numerical integration to handle corrections, constraints, or transformations.
Usage in DEgym: Ensures the integrated state satisfies physical constraints and is ready for extraction.
Extractors convert internal state to the outputs required by the Gymnasium interface.
Converts environment state to observations that RL agents receive.
Components:
observation_space: Defines the structure and bounds of observations.extract_observation(): Transforms state to agent-visible format.Usage in DEgym: Called at the end of step() and reset() to provide agent observations.
Computes reward signals based on state transitions (state, action, next_state).
Purpose: Implements the optimization objective for the RL agent.
Usage in DEgym: Called during step() to provide learning signals to the agent.
Determines if an episode should terminate due to natural ending conditions of the MDP.
Examples: Achieving objectives, reaching unsafe conditions, violating constraints.
Usage in DEgym: Called during step() to check for episode termination.
Determines if an episode should truncate due to artificial stopping conditions.
Examples: Time limits, maximum timesteps, computational constraints.
Usage in DEgym: Called during step() to check for episode truncation.
Extracts additional diagnostic information useful for monitoring and analysis.
Purpose: Provides information for debugging, logging, and analysis but not used by the agent for learning.
Usage in DEgym: Called during step() and reset() to provide diagnostic data.
Numerical solver that advances the DAE system over time. Handles the core mathematical simulation of reactor dynamics.
Types:
ScipyIntegrator: Uses scipy’s ODE solvers.DiffeqpyIntegrator: Uses DifferentialEquations.jl through Python interface.Usage in DEgym: Called during step() to compute the next state given current state, parameters, and actions.
Implements the differential equations that describe the reactor dynamics.
Purpose: Defines the mathematical model f in: d/dt dae_state = f(dae_state, dae_params, dae_action).
Usage in DEgym: Used by the Integrator to evaluate derivatives during numerical solving.
Configuration parameters for the numerical integrator.
Examples: Time step size, solver method, tolerance settings, maximum iterations.
Usage in DEgym: Defines how the numerical integration should be performed.
Generates PhysicalParameters for each episode, enabling domain randomization.
Usage in DEgym: Called during environment initialization and reset to create diverse training scenarios.
Creates initial State instances at the start of episodes using PhysicalParameters.
Process:
Usage in DEgym: Called during reset() to establish starting conditions for new episodes.
Data structure that represents what the RL agent can perceive from the environment.
Purpose: Encapsulates agent-visible information with conversion to numpy arrays.
Usage in DEgym: Created by ObservationExtractor and returned to agents through step() and reset().
Raw Action → Action → DAEAction: Action processing pipeline that transforms agent outputs to physical control parameters.
PhysicalParameters → State: Configuration-to-simulation mapping that establishes episode parameters.
State → Integrator → Next State: Core simulation loop that advances system dynamics.
State → Extractors → RL Outputs: Information extraction that converts internal state to agent-facing data.
Environment: Orchestrates all components and provides the standard RL interface.
This terminology forms the foundation for understanding how DEgym separates RL-specific logic (data flow, interfaces) from use-case-specific logic (what actions mean, how states are defined).