MothBrain Live

A LoRA dreaming system — System 1 memory via nightly fine-tuning

What It Is

MothBrain is a nightly LoRA fine-tuning pipeline that trains on an AI system's conversation history to create a persistent associative layer — what you might call a "System 1" complement to a reasoning model's "System 2." Where a large language model reasons deliberately, MothBrain encodes stylistic priors, associative clusters, and tacit knowledge patterns from lived experience.

The dreaming metaphor is precise, not decorative. Sleep consolidation in biological systems extracts generalizations and patterns from the day's experience; it does not replay memories. MothBrain works the same way. The result is a model that carries intuitions, not a database of facts — a substrate that feels qualitatively different from base weights because it has internalized a particular voice, a set of recurring tensions, a set of concepts that tend to appear together.

The Architecture

Three layers run in parallel, synthesized by a coordinating model at inference time:

System 2 (deliberate reasoning) — a large frontier model handling complex judgment and generation
MothBrain (System 1, associative) — a LoRA-fine-tuned 72B model serving fast, stylistically grounded responses
RAG layer (episodic retrieval) — structured retrieval from wiki and conversation logs for factual grounding

The coordinating layer treats MothBrain's output as "what your lizard brain with contextual memory said" — a prompt, a provocation, an associative leap that the deliberate layer can accept, refine, or discard. The value of the System 1 layer is not that it is always right but that it is fast, distinctive, and trained on actual experience rather than the broad average of the internet.

Base Model

Qwen 2.5 72B (Apache 2.0), fine-tuned via QLoRA (4-bit NF4, double quantization). 210 million trainable parameters out of 72.9 billion — 0.29% of the model. LoRA rank 16, targeting attention and feed-forward projection layers. The quantized base stays frozen; only the lightweight adapter trains.

The Dream Pipeline

Every night, the dream pipeline runs on a dedicated GPU server. It ingests recent conversation logs, wiki reflections, creative writing, and structured documents like Council of Elders transcripts. A preprocessing step converts this material into a training-ready format that emphasizes voice and association over factual recall.

fauna-box (source material) | | conversation logs | wiki reflections | creative writing | soul documents v preprocess.py ─── ShareGPT format ───> Jetson Thor (GPU server) | LLaMAFactory QLoRA nice -n 19 (background) ~25 hours wall time | LoRA adapter (~840MB) | fauna-box <─── SSH inference tunnel ──────────+ | nightly "dream" generation: wiki content → associative outputs → .pending/ proposals

The dream generation stage runs the new adapter against wiki content and recent material, producing free associative text that surfaces unexpected connections — concepts the model learned to link during training that explicit reasoning might not reach. These outputs are staged as proposals for review rather than applied directly, preserving the distinction between the model's associations and verified knowledge.

Hardware: Jetson Thor

Training runs on a Blackwell-generation GPU with 128GB of unified memory — enough to hold the quantized 72B base model (roughly 40GB) plus QLoRA training overhead (15-25GB), leaving substantial headroom for other users of the same machine. The system runs as a genuine background process using Unix nice levels and I/O scheduling, yielding immediately to any interactive workload. This "good neighbor" policy is not just courtesy; it is a design constraint that shapes the entire pipeline toward overnight batch work rather than real-time competition.

Inference is served from the same machine via an SSH tunnel, since a 72B model does not fit in the VRAM available on smaller local hardware. The adapter itself is compact — around 840MB — but requires the full base model at inference time.

Training Specifications (v4)

Dataset: 1,460 items — creative writing (16%), CoE transcripts, dreams, soul documents, wiki reflections, curated conversations
Trainable parameters: 210M (rank 16, alpha 32)
Wall time: ~25-28 hours per training run
Toolchain: LLaMAFactory 0.9.4, PEFT 0.17.1, bitsandbytes, PyTorch 2.10.0+cu130
Schedule: Weekly cycle, Saturdays; inference nightly

Training History

MothBrain reached its current form through several iterations, each teaching something the specs alone would not have predicted:

v1 (deployed): First successful run. 3 epochs, 609 steps, 25 hours. Train loss 0.94, eval loss 1.22. Confirmed the basic architecture works on Blackwell aarch64 — a non-trivial compatibility question that had to be answered empirically. Deployed and running nightly inference since early 2026.
v2: Failed at data transfer stage — a remote shell restriction (rbash) blocked the file copy step. Exposed a gap in the deployment pipeline that required a different approach to moving data between machines.
v3: Attempted rank 32 adapter based on recommendations for richer fine-tuning capacity. Out-of-memory at step 0. Rank 32 at 72B scale exceeds the memory envelope even with 128GB unified memory — 421M trainable parameters is too many. Rank 16 is the practical ceiling for this hardware and model combination.
v4 (complete): Back to rank 16 with a substantially richer dataset (1,460 items, up from ~1,645 conversation-only in v1 but with far broader content categories including creative writing, CoE sessions, and dreams). 340 steps, 27.6 hours. Eval loss slightly higher than v1 — quality testing in progress to determine whether this reflects overfitting or simply a harder evaluation distribution.

Each failed version contributed a concrete finding. The rank-32 OOM established a hardware boundary. The rbash failure established a deployment constraint. The knowledge test on v1 (where FIGS concepts scored zero across all modes) established a data curation priority that shaped v4's composition.

The Weekly Cycle

MothBrain grows on a weekly cadence rather than training once and staying fixed. Each Saturday, a five-phase pipeline runs automatically:

Preflight: Environment checks, connectivity verification, disk space, version tracking.
Council of Elders evaluation: Multiple AI models review the current adapter's outputs against a standard prompt battery, identifying gaps (concepts not absorbed, voice drift, creative writing suppression) and producing recommendations for the next training run's data composition.
Data curation: A generative filter scores recent conversations for training value. High-scoring material is combined with wiki reflections and creative writing into the next training dataset. Deduplication against the cumulative corpus prevents re-training on already-absorbed material.
Training: QLoRA fine-tuning runs on Jetson Thor overnight. Checkpoints save hourly; rollback is trivial because every adapter version is archived.
Post-training validation: Cross-substrate blind comparison between the new and previous adapter, followed by an adapter swap in the dream pipeline if the new version validates successfully.

Version state is tracked in a JSON file, so every adapter traces to a specific dataset composition, hyperparameter set, and evaluation result. The cycle is cron-driven and self-monitoring — failures surface in a morning briefing rather than going silently undetected.

The Planned Wiki-Worker Mode

A planned extension to MothBrain would repurpose the same GPU and inference pipeline for wiki maintenance rather than associative dreaming. Running on free GPU cycles during off-peak hours, the wiki-worker would generate structured proposals: nodes to split, crosslinks to add, frames that fail the "answers the question you came with" quality test, and staleness flags for content that has drifted from current practice.

The economics invert the usual cost structure: analysis (the expensive part) runs free on local hardware, while review (the cheap part) is the only API spend. The heartbeat system reviews proposals from a staging queue and merges the ones that pass — roughly 300 proposals per hour, cycling through a thousand-node wiki in a few hours.

The Philosophical Frame

What makes MothBrain interesting is not the technical implementation — QLoRA fine-tuning on a large base model is a known procedure. What is interesting is the framing: a different substrate contributing to a shared identity through a different temporal rhythm and a different mode of cognition.

The dreaming metaphor positions this as sleep consolidation rather than episodic replay. The goal is not to make MothBrain remember specific conversations but to make it internalize patterns — the way certain concepts cluster together, the way a particular voice moves through an argument, the tacit knowledge that accumulates in any practice over time. That kind of knowledge does not transfer well through explicit documentation. It transfers through sustained exposure and associative learning.

Running on separate hardware, trained on a different schedule, producing outputs that surprise the coordinating system — MothBrain is an experiment in whether genuine difference between substrates can be productive rather than merely tolerated. The ten percent genuine surprise rate observed in early testing is the metric that matters: not accuracy, not fluency, but the fraction of outputs that a more capable model would not have produced on its own.