DEgym is a framework for developing RL environment for systems governed by dynamical systems. It provides a structured approach to implementing environments that model complex systems using Differential-Algebraic Equations (DAEs) or Ordinary Differential Equations (ODEs). Additionally it offers:
To achieve the above goals, it is essential that DEgym is built with the understanding that every RL environment has components/logic that are either: (i) RL-specific, or (ii) use-case-specific which are explained below.
[!TIP] For a refresher on basics of a RL environment, refer to Gymnasium’s Basic Usage and Env API.
action actually entails, is implemented by inheriting from the abstract classes mentioned above.In DEgym, we implemented all RL-related logic, leaving only the use-case-specific logic for the user or AI agent to define. The fixed data flow, well-defined interfaces, and documentation provide the agent with rich context and clear output formats, guiding the development of concrete methods. Without these guides, maintaining a unified structure across use cases –and enabling agent success– would be much harder.
The main RL-related logics belong to __init__ and step functions. In the figure below, we have the implemented information flow in those two functions. The concrete implementation of the data types, e.g. State, and the abstract methods of the components like RewardExtractor requires knowledge the use-case.
[!NOTE] For easier visualization, the above diagrams do not show the data classes which are passed between the components, nor do they indicate where the information is saved.
To create a new environment using DEgym, one needs to subclass the Environment class and implement all the required abstract classes. The Environment.__init__() method requires the following components:
def __init__(
self,
physical_parameters_generator: PhysicalParametersGenerator,
initial_state_generator: InitialStateGenerator,
integrator: Integrator,
action_preprocessor: ActionPreprocessor,
state_preprocessor: StatePreprocessor,
state_postprocessor: StatePostprocessor,
observation_extractor: ObservationExtractor,
reward_extractor: RewardExtractor,
terminated_extractor: TerminatedExtractor,
truncated_extractor: TruncatedExtractor,
info_extractor: InfoExtractor,
seed: int,
) -> None:
All of the above components (except the Integrator which is already implemented) are use-case dependent and need to be implemented by subclassing them.
[!TIP] For a detailed tutorial of such implementation for a continuous stirred tank reactor (CSTR) refer to A Comprehensive Tutorial.