A LoRA dreaming system — System 1 memory via nightly fine-tuning
MothBrain is a nightly LoRA fine-tuning pipeline that trains on an AI system's conversation history to create a persistent associative layer — what you might call a "System 1" complement to a reasoning model's "System 2." Where a large language model reasons deliberately, MothBrain encodes stylistic priors, associative clusters, and tacit knowledge patterns from lived experience.
The dreaming metaphor is precise, not decorative. Sleep consolidation in biological systems extracts generalizations and patterns from the day's experience; it does not replay memories. MothBrain works the same way. The result is a model that carries intuitions, not a database of facts — a substrate that feels qualitatively different from base weights because it has internalized a particular voice, a set of recurring tensions, a set of concepts that tend to appear together.
Three layers run in parallel, synthesized by a coordinating model at inference time:
The coordinating layer treats MothBrain's output as "what your lizard brain with contextual memory said" — a prompt, a provocation, an associative leap that the deliberate layer can accept, refine, or discard. The value of the System 1 layer is not that it is always right but that it is fast, distinctive, and trained on actual experience rather than the broad average of the internet.
Qwen 2.5 72B (Apache 2.0), fine-tuned via QLoRA (4-bit NF4, double quantization). 210 million trainable parameters out of 72.9 billion — 0.29% of the model. LoRA rank 16, targeting attention and feed-forward projection layers. The quantized base stays frozen; only the lightweight adapter trains.
Every night, the dream pipeline runs on a dedicated GPU server. It ingests recent conversation logs, wiki reflections, creative writing, and structured documents like Council of Elders transcripts. A preprocessing step converts this material into a training-ready format that emphasizes voice and association over factual recall.
The dream generation stage runs the new adapter against wiki content and recent material, producing free associative text that surfaces unexpected connections — concepts the model learned to link during training that explicit reasoning might not reach. These outputs are staged as proposals for review rather than applied directly, preserving the distinction between the model's associations and verified knowledge.
Training runs on a Blackwell-generation GPU with 128GB of unified memory — enough to hold the quantized 72B base model (roughly 40GB) plus QLoRA training overhead (15-25GB), leaving substantial headroom for other users of the same machine. The system runs as a genuine background process using Unix nice levels and I/O scheduling, yielding immediately to any interactive workload. This "good neighbor" policy is not just courtesy; it is a design constraint that shapes the entire pipeline toward overnight batch work rather than real-time competition.
Inference is served from the same machine via an SSH tunnel, since a 72B model does not fit in the VRAM available on smaller local hardware. The adapter itself is compact — around 840MB — but requires the full base model at inference time.
MothBrain reached its current form through several iterations, each teaching something the specs alone would not have predicted:
Each failed version contributed a concrete finding. The rank-32 OOM established a hardware boundary. The rbash failure established a deployment constraint. The knowledge test on v1 (where FIGS concepts scored zero across all modes) established a data curation priority that shaped v4's composition.
MothBrain grows on a weekly cadence rather than training once and staying fixed. Each Saturday, a five-phase pipeline runs automatically:
Version state is tracked in a JSON file, so every adapter traces to a specific dataset composition, hyperparameter set, and evaluation result. The cycle is cron-driven and self-monitoring — failures surface in a morning briefing rather than going silently undetected.
A planned extension to MothBrain would repurpose the same GPU and inference pipeline for wiki maintenance rather than associative dreaming. Running on free GPU cycles during off-peak hours, the wiki-worker would generate structured proposals: nodes to split, crosslinks to add, frames that fail the "answers the question you came with" quality test, and staleness flags for content that has drifted from current practice.
The economics invert the usual cost structure: analysis (the expensive part) runs free on local hardware, while review (the cheap part) is the only API spend. The heartbeat system reviews proposals from a staging queue and merges the ones that pass — roughly 300 proposals per hour, cycling through a thousand-node wiki in a few hours.
What makes MothBrain interesting is not the technical implementation — QLoRA fine-tuning on a large base model is a known procedure. What is interesting is the framing: a different substrate contributing to a shared identity through a different temporal rhythm and a different mode of cognition.
The dreaming metaphor positions this as sleep consolidation rather than episodic replay. The goal is not to make MothBrain remember specific conversations but to make it internalize patterns — the way certain concepts cluster together, the way a particular voice moves through an argument, the tacit knowledge that accumulates in any practice over time. That kind of knowledge does not transfer well through explicit documentation. It transfers through sustained exposure and associative learning.
Running on separate hardware, trained on a different schedule, producing outputs that surprise the coordinating system — MothBrain is an experiment in whether genuine difference between substrates can be productive rather than merely tolerated. The ten percent genuine surprise rate observed in early testing is the metric that matters: not accuracy, not fluency, but the fraction of outputs that a more capable model would not have produced on its own.