Phase Archive
Phase 1: Setup
The Foundation (Completed) ✅
- [x] ETL Pipeline: Robust recording of pixel buffers and input vectors.
- [x] Engine Bridge: ViZDoom integration with custom
cfginjection. - [x] Data Sanitation: Automated inspection tools.
- [x] Architecture: Modular, config-driven Python application.
The Brain (Completed) ✅
- [x] Dataset Class:
IterableDatasetwith sliding window segmentation. - [x] Liquid Network:
CfC(Closed-form Continuous) implementation. - [x] Training Loop: Behavioral Cloning with
BCEWithLogitsLoss. - [x] Optimization: Auto-device selection (CUDA/MPS/CPU).
The Body (Completed) ✅
- [x] Inference Engine: Real-time game loop driven by the LNN.
- [x] Action Logic: Thresholding and conflict suppression logic.
Phase 2: Distributed Architecture
Goal: Containerize the LNN agent and connect it to a dedicated multiplayer DOOM server. Benchmark Golem against the historical champions of the Visual Doom AI Competition (VDAIC).
1. The Host Server (Completed) ✅
- [x] Dedicated Host Script: Initialize a central ViZDoom instance in
-host N -deathmatchmode. - [x] Multiplayer Configuration: Update configs and select WADs with proper multiplayer spawn points.
- [x] Host Container: Create a lightweight
Dockerfile.hostthat only installs the ViZDoom engine and the host script.
2. The Golem Client (Completed) ✅
- [x] Headless Rendering: Configure the ViZDoom to render the visual buffer headlessly.
- [x] Network Synchronization: ensure the LNN's inference tick-rate stays aligned with the network server.
- [x] Modular Dockerfile: The image should package Python 3.10+, PyTorch, and ViZDoom, but omit the model weights.
- [x] Volume Mounting: Configure the container entrypoint to load the
golem.pthbrain andapp.yamlconfiguration from a mounted volume directory (e.g.,-v ./data/fluid:/app/data).
3. The Legacy Champions (Completed) ✅
- [x] Archive Retrieval: Clone the legacy Dockerfiles and weights for the 2016/2017 VDAIC winners (e.g., Arnold by CMU, IntelAct by Intel Labs) from the official GitHub archives.
- [x] Legacy Container Builds: Build the historical images.
4. Orchestration (Completed) ✅
- [x] Docker Compose: Create a
docker-compose.ymlto network the swarm. - [x] The Roster: Define the services in the compose file to simultaneously spin up:
- 1x Host Arena Server
- 2x Legacy Champion Bots (e.g., Arnold, IntelAct)
- Nx Golem Agents (using the same image, but mounting different profile volumes, e.g.
basicorfluid).
- [x] Agent Parameterization: Pass unique names and colors via environment variables.
Phase 3: Multi-Modal Sensor Fusion
Goal: Expand the agent's sensory perception beyond the 2D pixel array by integrating ViZDoom's raw depth and audio buffers into the Liquid Neural Network, effectively granting stereopsis and audition without exposing underlying game-state variables.
1. The Configuration Layer (Completed) ✅
- [x] Sensor Toggles: Update
app.yamlto include abrain.sensorsblock with boolean toggles fordepthandaudio. - [x] Dynamic Action Space: Update
config.pyto parse these toggles and pass them to the ETL and Model initialization layers.
2. The ETL Pipeline (Record & Transform) (Completed) ✅
- [x] Depth Extraction: Modify
record.pyto capturestate.depth_buffer. Normalize the 1D distance matrix to \([0, 1]\). - [x] Audio Extraction: Modify
record.pyto capturestate.audio_buffer. Normalize the raw stereo waveforms. - [x] Tensor Packaging: Update the
.npzsaving mechanism to storedepthandaudioarrays only if they are enabled in the active profile, preventing massive file bloat for purely visual agents.
3. The Brain (Architecture Redesign) (Completed) ✅
- [x] Stereopsis Integration: If
depthis enabled, modify the Visual Cortex CNN input channels from \(C=3\) (RGB) to \(C=4\) (RGB + Depth). - [x] Auditory Cortex: If
audiois enabled, implement a parallel 1D Convolutional Neural Network (nn.Conv1d) to extract features from the high-frequency audio buffer. - [x] Sensor Fusion: Concatenate the flattened visual/depth feature vector with the auditory feature vector before passing the unified tensor into the Liquid
CfCcore.
Phase 4: Auditory Phenomenology Refactoring
Goal: Transition from processing raw 1D audio waveforms to 2D Mel Spectrograms. This improves LNN stability by leveraging spatial locality in convolutional networks, allowing the model to recognize the "visual" shape of audio cues (like a fireball or monster growl) while naturally compressing high-frequency acoustic noise.
1. The ETL Pipeline (Completed) ✅
- [x] Audio Normalization: Enforce strict zero-mean, unit-variance normalization on the raw audio buffer at extraction to prevent gradient explosion.
- [x] Spectrogram Generation: Integrate
torchaudio.transforms.MelSpectrogramfollowed bytorchaudio.transforms.AmplitudeToDBinto the data transformation layer (dataset.py). This will mathematically convert the raw 1D audio arrays into dense 2D time-frequency tensors (scaled to decibels) on the fly during the__getitem__call.
2. The Configuration Layer (Completed) ✅
- [x] DSP Hyperparameters: Expand
app.yamlto include abrain.dspblock containing parameter tunings for the Mel Spectrogram generation. Required parameters include the engine'ssample_rate,n_fft(e.g., 1024),hop_length(e.g., 256), andn_mels(e.g., 64).
3. The Brain (Completed) ✅
- [x] 2D Auditory Cortex: Replace the
nn.Conv1dauditory cortex inbrain.pywith a standardnn.Conv2darchitecture, mathematically aligning sound classification with the existing spatial and visual processing hierarchy. - [x] Sensor Fusion Re-Alignment: Ensure the concatenation logic dynamically calculates the flattened feature size of the newly generated 2D auditory feature map before routing the unified tensor into the CfC liquid core.
Phase 5: Thermal Phenomenology
Goal: Decouple spatial navigation from enemy detection by utilizing ViZDoom's semantic segmentation labels_buffer. This isolates active entities from the background into a clean, binary "thermal" mask, severely reducing the visual noise the model must parse during combat.
1. The Configuration Layer (Completed) ✅
- [x] Sensor Toggle: Expand
app.yamlto include athermal: trueflag in thebrain.sensorsconfiguration block. - [x] State Validation: Update
config.pyto accurately parse the boolean into the initialization pipelines.
2. The ETL Pipeline (Completed) ✅
- [x] Engine Initialization: Update
utils.pyto callgame.set_labels_buffer_enabled(True)when the thermal sensor is flagged. - [x] Thermal Mask Extraction: In
record.py, capturestate.labels_buffer, apply a binary threshold (pixels > 0 = 1) to drop environmental geometry, and resize the mask to 64x64. - [x] Tensor Packaging: Save the resulting thermal arrays to the generated
.npzarchive. - [x] Dataset Streaming: Update
dataset.pyto load the thermal arrays and feed them into the model alongside the visual input.
3. The Brain (Completed) ✅
- [x] Parallel Thermal Cortex: Update
brain.pyto instantiate an isolatednn.Conv2dbranch dedicated to processing the thermal mask, allowing the network to learn independent dynamic entity-tracking filters. - [x] Sensor Fusion: Concatenate the flattened thermal feature map with the visual/depth and auditory representations before routing the unified tensor into the CfC liquid core.