Golem: The DOOM LNN Project

Golem is an open-source initiative to develop autonomous, adaptive agents for DOOM using Liquid Neural Networks (LNNs).

Current AI in DOOM relies on finite state machines (FSMs) written in the 90s. While functional, they are predictable and stateless. Golem aims to replace these static heuristics with Neural Circuit Policies (NCPs)—biologically inspired neural networks that model time as a continuous flow rather than discrete ticks.

Why Liquid Networks?

Unlike Large Language Models (LLMs) which hallucinate state, or traditional Reinforcement Learning (RL) which requires millions of training steps, LNNs are:

Causal: They learn cause-and-effect relationships in noisy environments.
Compact: Runnable on consumer hardware with minimal latency (<20ms).
Continuous: They handle the variable time-steps of a game engine natively via Ordinary Differential Equations (ODEs).

🏗 System Architecture

The project follows a strict ETL (Extract, Transform, Load) pipeline pattern to decouple the game engine from the multi-modal inference model.

graph LR
    A[ViZDoom Engine] -->|RGB / Depth / Audio / Labels| B(Extract & Transform)
    B -->|Multi-Modal Tensors| C{The Brain}
    C -->|Logits| D[Action Vector]
    D -->|Input| A

    style C fill:#f96,stroke:#333,stroke-width:2px

Data Pipeline Phases

Extract (Perception): Interfaces with libvizdoom to capture the raw phenomenological buffers (RGB screen, stereoscopic depth, high-frequency stereo audio, and semantic labels). Latent game variables (health, ammo, coordinates) are explicitly discarded to force multi-modal heuristic learning.
Transform (Normalization & DSP): * Visual & Depth: Downsampling via bilinear interpolation (64x64) and min-max scaling ().
Thermal: Binary thresholding of the semantic labels buffer to isolate active entities, followed by nearest-neighbor downsampling (64x64).
Audio: Zero-mean, unit-variance normalization, transformed via STFT into dense 2D time-frequency Mel Spectrograms, and logarithmically scaled to decibels.
Channel Permutation: Matrix transposition for PyTorch (N, C, H, W).
Load (Inference/Training): Feeds dynamic sequence tensors into the parallel Convolutional Neural Networks (Visual, Auditory, Thermal cortices) before concatenating the latent features into the Neural Circuit Policy (NCP) to generate action probability distributions.

🚀 Setup

System Prerequisites

Python: 3.10+
C++ Compiler: ViZDoom requires a modern C++ compiler (clang/gcc) and libraries (SDL2, OpenAL) if building wheels from source.
Hardware Acceleration:
- Apple Silicon: Metal (MPS) is supported automatically.
- NVIDIA: Requires CUDA 11.8+.

Bash / ZshFish

# 1. Create Virtual Environment
python -m venv .venv
source ./.venv/bin/activate
# 2. Install Dependencies
pip install -r requirements.txt

# 1. Create Virtual Environment
python -m venv .venv
source ./.venv/bin/activate.fish
# 2. Install Dependencies
pip install -r requirements.txt

🛠 Usage Cycle

Golem operates in a continuous iterative loop. Select a phase below to view the command syntax.

1. Configure2. Generate3. Randomize4. Record5. Inspect6. Train7. Audit8. Intervene9. Summary10. Run11. Examine

Before running, verify your hyperparameters, architectural depths, and active profile in conf/app.yaml.

The training.config string defines which superset of actions the agent learns:

basic: 8 dimensions (Movement, Attack, Use).
classic: 10 dimensions (Basic + explicit slot 2 and slot 3 keys).
fluid: 9 dimensions (Basic + weapnext).

The brain.sensors block enables multi-modal phenomenological fusion:

visual: Base RGB screen buffer.
depth: Stereoscopic distance matrix.
audio: High-frequency waveforms transformed into 2D Mel Spectrograms.
thermal: Binary semantic segmentation masks to isolate dynamic entities.

Compile a randomized BSP map using the Oblige procedural generation engine. This helps inject massive geographic variance into the training corpus to prevent spatial overfitting.

python main.py generate

Run a continuous pipeline to procedurally generate maps, launch the engine, and record expert demonstration data across highly varied topological environments.

python main.py randomize

Launch the engine in Spectator Mode to capture training data manually on specific modules. The engine binds keys dynamically based on the active profile in app.yaml.

# Usage: python main.py record --module <module_name>
python main.py record --module combat

Verify your dataset is balanced and normalized before training via Jinja2 template reports.

python main.py inspect

Tip

Look for High Idle Time. If the agent spends >50% of the time doing nothing, the model will converge to inaction due to gradient sparsity.

Run the Behavioral Cloning loop to map the multi-modal observations to action vectors via Binary Cross-Entropy. The DataLoader automatically applies dynamic spatial Mirror Augmentation.

# Trains on ALL available data modules across the active profile
python main.py train --module all

Run a diagnostic Brain Scan to check for class-imbalance failures against the test data.

# Cap the evaluation to 50 sequence batches
python main.py audit --module all

# Run a full-corpus evaluation without overlapping sequence strides
python main.py audit --module all --full

Launch the agent autonomously. If the agent enters an equilibrium state (e.g., staring at a corner), hold Left Shift to suspend the LNN logits and capture raw keyboard overrides.

python main.py intervene --module combat

Releasing the key automatically dumps a _recovery trace file to cure Covariate Shift.

Generate a topological breakdown of the active neural architecture via torchinfo. This executes a dummy forward pass to validate that the active sensor cortices are dynamically scaling and concatenating properly into the liquid core.

python main.py summary

Watch the LNN play the game live. The agent manages a continuous hidden state (hx) through the Liquid ODEs.

python main.py run --module combat

To get a sense of what frames are activating the model during gameplay, generate heat maps of the final convolution's output during a particular sequence of training data.

python main.py examine --sequence 100

📜 License

MIT License.

DOOM is a registered trademark of id Software.