🗺 Roadmap
!!! "Dictionary" - [ ] Open - [x] Closed - [-] Blocked - [~] In Progress
See Phase Archive for the project's completed phases.
Phase 6: Second-Order Cognitive Dynamics (Latent Inertia)
Goal: Transition the Liquid Neural Network's hidden state from a first-order kinematic model to a true second-order dynamical system. By granting the latent state "momentum" via a coupled system of Ordinary Differential Equations (ODEs), the agent can accumulate force to escape localized equilibrium traps (e.g., staring at corners) without requiring explicit exogenous input.
1. Configuration & Taxonomy Layer
- [ ] ODE Configuration: Expand the
brainblock inapp.yamlto include anode_orderparameter (accepting integer values1or2). - [ ] State Validation: Update
config.pyto accurately parseode_orderinto the initialization pipelines. - [ ] Model Archiving Schema: Modify
train.pyandutils.pyto append the ODE order to the saved.pthweights (e.g.,<YYYY-MM-DD>.c-<depth>.w-<length>.o-<order>.<increment>.pth). Ensureget_latest_parametersremains backwards-compatible with older, first-order checkpoints.
2. The Brain (Architecture Redesign)
- [ ] Hamiltonian Memory Split: Update
brain.pyto dynamically adjust the capacity of theliquid_rnnbased on the configuredode_order. For second-order systems, theworking_memorymust effectively track both latent position (\(x_1\)) and latent momentum (\(x_2\)). - [ ] Coupled ODE Forward Pass: Rewrite the forward pass logic in
DoomLiquidNetto support second-order integration. Whenode_order == 2, mathematically decompose the second-order ODE into a system of two coupled first-order ODEs: $\(\frac{dx_1(t)}{dt} = x_2(t)\)$ $\(\frac{dx_2(t)}{dt} = f(x_1(t), x_2(t), I(t); \theta)\)$
3. The Pipeline (State Management)
- [ ] State Initialization: Update
run.pyandintervene.pyto initialize and pass the expanded/tuple-based hidden state \(hx\) when the second-order architecture is active. - [ ] Physiological Reset (Death): Implement a strict state-check inside the inference loops. If
game.is_player_dead()is true, explicitly detach and zero-out the hidden state. This prevents "past life" momentum leakage where the newly respawned agent inadvertently reacts to the accumulated latent velocity of its previous death.
Risk Assessment
Training Overhead: Moderate
The BPTT (Backpropagation Through Time) algorithm must compute gradients through a coupled system of equations:
$$ \frac{dx_1(t)}{dt} = x_2(t), \quad \frac{dx_2(t)}{dt} = f(x_1(t), x_2(t), I(t); \theta) $$
Because the working_memory capacity must effectively double to accommodate the bipartite Hamiltonian state tuple, VRAM consumption will increase proportionally.
Runtime Overhead: Low
Because the network relies on the Closed-form Continuous (CfC) approximation rather than a traditional numerical solver (like Runge-Kutta), you bypass the severe latency penalties of iterative ODE evaluations. The primary cost is the increased matrix multiplication dimension resulting from the doubled hidden state size.
Assessment: Acceptable risks. Cleared to implement.
Phase 7: The Parietal Binding (Cross-Attention Sensorimotor Integration) 🧠
Goal: Shift from naive flat tensor concatenation to a structured cross-attention bottleneck. By forcing the Thermal Cortex to query the Visual Cortex, the network learns to synthesize abstract spatial relationships (e.g., "enemy is next to the explosive barrel") rather than just reacting to isolated stimuli.
1. Configuration & Taxonomy Layer
- [-] Attention Toggles: Update
app.yamlto include anattention_headsconfiguration integer under thebrainblock (e.g., 4). - [-] State Validation: Update
config.pyto accurately parse this parameter and pass it to the model initializer.
2. The Brain (Architecture Redesign)
- [-] Multi-Head Attention: Modify
DoomLiquidNetinbrain.pyto instantiate ann.MultiheadAttentionlayer. - [-] Q-K-V Projection: In the
forwardpass, project the flattened Thermal feature map (\(T\)) into the Query (\(Q\)), and the Visual/Depth feature map (\(V\)) into the Keys (\(K\)) and Values (\(V\)). - [-] Sensorimotor Fusion: Flatten the resulting contextual output tensor and feed it as the input \(I(t)\) into the Liquid Core, scaling the
working_memoryinput dynamically.
Risk Assessment
Training Overhead: High.
Attention mechanisms scale quadratically in complexity. While projecting the flattened feature maps into \(Q\), \(K\), and \(V\) matrices adds a modest number of parameters, the actual \(Q K^\top\) dot-product attention drastically increases memory bandwidth demands and FLOPs during the backward pass.
Runtime Overhead: High Risk
Multi-head attention is notoriously memory-bandwidth bound during autoregressive inference. Each attention head requires loading the respective matrices into GPU memory, which can introduce micro-latencies. Unless highly optimized (e.g., utilizing FlashAttention or grouping queries), this phase poses the highest risk of blowing past the ViZDoom engine's \(35\text{Hz}\) (\(\approx 28\text{ms}\)) temporal limit, leading to desynchronization in the multiplayer arena.
Assessment: Unacceptable risks. Do not implement.
Phase 8: The Prefrontal Hierarchy (Multi-Scale Liquid Time-Constants) ⏳
Goal: Split the Liquid Core into two decoupled Ordinary Differential Equations (ODEs) operating at different frequencies. This allows tactical reflexes to execute in milliseconds while a strategic, long-term context is maintained over seconds, bridging the gap between instinct and abstract strategy.
1. Configuration Layer
- [-] Hierarchical Timesteps: Update
app.yamlto include abrain.hierarchyblock definingfast_hz(e.g., 35) andslow_hz(e.g., 2).
2. The Brain (Architecture Redesign)
- [-] Bifurcated Core: Modify
brain.pyto instantiate two separateCfCmodules:FastCore(tactical motor mapping) andSlowCore(strategic context). - [-] Temporal Gating: Implement a gating mechanism in the
forwardpass so theSlowCoreonly integrates periodically (e.g., every 17 frames). - [-] Top-Down Bias: Pass the hidden state of the
SlowCoreas a continuous, concatenating bias to theFastCore's input sequence.
3. The Pipeline (State Management)
- [-] Complex State Persistence: Adjust the recurrent state tuple \(hx\) across
run.pyandintervene.pyto persist a dictionary or tuple of hierarchical memories (hx_fast,hx_slow) across the inference loop.
Risk Assessment
Training Overhead: High
BPTT must now unroll and track gradients across two vastly different temporal scales. The parameters for the LNN essentially double, and the computational graph becomes highly complex, as the SlowCore's hidden state acts as a continuous bias injection into the FastCore.
Runtime Overhead: Moderate (with Jitter)
The average FLOP count per frame only increases slightly, as the FastCore handles the bulk of the rapid inference. However, this architecture introduces latency jitter. On the frames where the SlowCore triggers its integration (approximately every 17 frames), the computational load spikes. Carefully profile this worst-case frame execution time to ensure it does not break the lockstep networking protocol.
Assessment: Awaiting further assessment after Phase 6. Do not implement.
Phase 9: Forward Internal Models (Predictive Coding) 🔮
Goal: Move beyond pure Behavioral Cloning by forcing the network to anticipate the future. Adding a self-supervised prediction head forces the latent space to encode the physics and movement dynamics of the DOOM engine, naturally inducing strategic planning.
1. Configuration Layer
- [ ] Predictive Toggle: Update
app.yamlto toggletraining.predictive_codingand define a temporal forecast horizon parameter \(k\) (e.g., 10 frames).
2. The Brain (Architecture Redesign)
- [ ] Hallucination Decoder: Implement a transposed convolutional decoder (
nn.ConvTranspose2d) inbrain.pythat branches off the liquid hidden state \(x(t)\). - [ ] Future Projection: Configure the decoder to output a hallucinated spatial prediction of the future thermal mask \(\hat{T}(t+k)\).
3. The Pipeline (ETL & Training)
- [ ] Temporal Offset Streaming: Update
DoomStreamingDatasetindataset.pyto dynamically yield a future target tensor \(T(t+k)\) alongside the standard sequence inputs and action labels. - [ ] Composite Loss Function: Modify the optimization loop in
train.pyto evaluate both the action logits and the thermal hallucination via a composite loss function: \(\mathcal{L}_{total} = \mathcal{L}_{BCE\_Action} + \lambda \mathcal{L}_{MSE\_ThermalFuture}\).
Risk Assessment
Training Overhead: Severe
This is the most computationally expensive upgrade on the board. The dataset must now stream offset future tensors into memory, vastly increasing I/O pressure and RAM utilization. The addition of the Transposed CNN decoder essentially doubles the size of the network. Furthermore, optimizing the composite loss function (\(\mathcal{L}_{total} = \mathcal{L}_{BCE\_Action} + \lambda \mathcal{L}_{MSE\_ThermalFuture}\)) requires computing gradients for both the classification head and the dense image generation head simultaneously.
Runtime Overhead: Negligible to Zero
This is a purely structural training constraint. Because the agent only requires the output of the Linear Motor Cortex to play the game, the entire hallucination decoder can be detached and bypassed during live inference.
Assessment: Acceptable, if gated behind a configuration property that is disabled by default. Cleared to implement.
Phase 10: Cortical Auxiliary Heads (Isolated Representation Learning) ان
Goal: Attach secondary linear heads directly to the latent output vectors of specific cortices (e.g., Thermal, Visual) prior to sensorimotor concatenation. This enables the application of targeted, isolated loss functions (e.g., BCE for enemy counting on the thermal mask) directly to the sub-networks, accelerating feature extraction without waiting for the slow, end-to-end action gradient.
1. Configuration Layer
- [ ] Auxiliary Toggles: Update
app.yamlto include anauxiliary_headsconfiguration block underbrain(e.g., toggling thermal enemy counting) and corresponding \(\lambda\) weights under thelossblock. - [ ] State Validation: Update
config.pyto parse the new auxiliary settings and loss weights during pipeline initialization.
2. The Brain (Architecture Redesign)
- [ ] Secondary Linear Heads: Modify
DoomLiquidNetinbrain.pyto conditionally instantiatenn.Linearlayers branching directly off the flattened cortical vectors (e.g., \(T(t)\) or \(V(t)\)). - [ ] Multi-Output Forward Pass: Update the
forwardmethod to return a dictionary of auxiliary predictions alongside the primary action logits and the recurrent hidden state.
3. The Pipeline (ETL & Training)
- [ ] Ground Truth Extraction: Update
record.pyandDoomStreamingDatasetto extract, store, and stream the necessary ground truth labels for the auxiliary tasks (e.g., parsing the exact number of visible enemies from ViZDoom's underlyingstatevariables). - [ ] Composite Objective Function: Modify the optimization loop in
train.pyto compute and sum the isolated losses against the main behavioral cloning target: \(\mathcal{L}_{Total} = \mathcal{L}_{Action} + \lambda \mathcal{L}_{Aux\_Thermal} + \dots\)
Risk Assessment
Training Overhead: Moderate
Optimizing a composite loss function requires calculating gradients for both the primary classification head and the auxiliary heads simultaneously. However, the additional parameters (small linear layers) are mathematically trivial compared to the deep CNNs. The primary overhead is I/O related: modifying the dataset to extract and stream additional ground truth labels from the engine state.
Runtime Overhead: Zero
This is a purely structural training enhancement. Because the agent only requires the output of the primary Motor Cortex to play the game, the auxiliary heads can be completely detached and bypassed during live inference, maintaining strict temporal compliance with the \(35\text{Hz}\) engine loop.
Assessment: High reward, zero runtime risk. Cleared to implement.
Distant Future
The Possession (Integration) 👻
- Goal: Move beyond the Python wrapper and replace in-game enemy AIs.
- [-] Engine Fork: Compile the PyTorch model to TorchScript (
.pt) and linklibtorchdirectly into a C++ source port to bypass pixel-rendering entirely.