Skip to content

kumarishan/ssm-lab

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Tiny Mamba SSM Lab

This repository is primarily for learning and experimenting with state space models, especially readable Mamba-2-style and Mamba-3-style architectures.

The current end-to-end example tasks are:

  • raw language-model pretraining on TinyStories or local text
  • fine-tuning on a toy subject-predicate-object extraction task

The architecture is the main subject. The SPO pipeline is only one downstream example used to exercise the model.

The default mamba3 settings now use an approximately 1M-parameter model so the repo remains small but is better suited to modest GPUs.

What This Repo Is For

  • studying how selective state space updates work in practice
  • comparing a simpler Mamba-2-style block against a richer Mamba-3-style block
  • running small CPU-friendly architecture experiments
  • using pretraining plus fine-tuning as a test harness for new block ideas

This is an educational codebase, not a production implementation.

What Is Implemented

  • mamba2 A small Mamba-2-style block with depthwise convolution and selective state updates.
  • mamba3 An educational Mamba-3-style block with exponential-trapezoidal updates, rotary or complex state mixing, BC normalization, B/C biases, and optional MIMO rank.
  • byte and character tokenizers
  • raw language-model pretraining
  • a toy downstream fine-tuning task

Learning Path

The detailed docs live in docs/:

If you want to understand the architecture, start with the first four. If you want to run experiments, then move to the last two.

Repository Layout

  • ssmlab/mamba/model.py Core architecture: config, Mamba-2 block, Mamba-3 block, and the tiny causal language model.
  • ssmlab/common/tokenizer.py Character and byte tokenizers.
  • ssmlab/common/pretrain_data.py Raw-text dataset utilities for language-model pretraining.
  • ssmlab/mamba/pretrain.py Pretraining entry point for TinyStories or local text.
  • ssmlab/common/data.py Synthetic SPO task generation and output parsing.
  • ssmlab/mamba/train.py Fine-tuning loop for the SPO demo task.
  • ssmlab/mamba/infer.py Inference CLI for the SPO demo task.

Install

uv sync

If you want uv to manage Python too:

uv python install 3.13
uv sync --python 3.13

ssmlab YAML CLI

The top-level workflow is now driven by ssmlab, a Python CLI backed by ssmlab.yaml.

The CLI shape is:

uv run ssmlab pretrain --model <name> --target <target>
uv run ssmlab train --model <name> --target <target>
uv run ssmlab infer --model <name> --target <target>

--model selects a named model version from YAML, for example mamba3-1b-a1b2. --target selects where to run it, for example local or a named SSH machine such as gpu-box.

The config file is split into:

  • models: named model versions and their task configs
  • data_sources: reusable dataset definitions for training tasks
  • targets: local or SSH machines
  • tasks.pretrain / tasks.train / tasks.infer: top-level CLI tasks

Right now, the named data-source layer is implemented only for tinystories. The schema is there for additional sources later, but the sample config keeps it on TinyStories only.

The default ssmlab.yaml in the repo already shows two model versions:

  • mamba3-1b-a1b2
  • mamba3-1b-a1b2-debug

Each model keeps its shared architecture params once under shared_args, then adds task-specific args under tasks.pretrain.args and tasks.train.args. The TinyStories dataset settings now live under top-level data_sources, and tasks.pretrain references one with data_source: ....

If you need one-off flag overrides, append them after --:

uv run ssmlab pretrain --model mamba3-1b-a1b2 --target local -- --max-train-stories 4000 --log-every 5

Remote GPU On Vast

For the default ~1M-parameter mamba3 setup, the cheapest sensible Vast target is usually a single RTX 3060 12GB or Tesla T4 16GB. A 6GB card is the practical floor, but 8GB+ is the safer starting point because it gives more headroom for batch size and longer sequences.

Configure the machine once in ssmlab.yaml under targets. For example:

targets:
  gpu-box:
    kind: ssh
    host: root@example.com
    port: 22
    remote_dir: /workspace/ssm
    bootstrap_python: python3
    python_bin: python
    venv_dir: /workspace/ssm/.venv
    device: cuda
    detach: true
    log_file: runs/remote-gpu.log
    pid_file: runs/remote-gpu.pid

Then launch the configured model version on that machine:

uv run ssmlab pretrain --model mamba3-1b-a1b2 --target gpu-box

The SSH target path:

  • syncs the repo with rsync
  • creates or refreshes a remote virtualenv
  • reuses the machine's system PyTorch when available
  • installs a CUDA-enabled torch wheel if needed
  • installs this package plus runtime dependencies
  • validates torch.cuda.is_available()
  • runs the configured task in the remote repo

With detach: true, the remote target runs under nohup-style detached execution and writes logs and pid files to the configured paths.

Quick Start

1. Pretrain The Architecture

uv run ssmlab pretrain --model mamba3-1b-a1b2 --target local

This trains the architecture as a plain language model and is the cleanest way to study how the SSM behaves on raw text.

To use a local corpus instead of TinyStories, either update the YAML model definition or append a one-off override:

uv run ssmlab pretrain --model mamba3-1b-a1b2 --target local -- --text-file ./some_corpus.txt --output-dir runs/local_lm

2. Train The Model

The second CLI task is train. It resolves tasks.train for the same named model. In the sample YAML, train is wired to fine-tuning:

uv run ssmlab train --model mamba3-1b-a1b2 --target local

If you want a different action for train, point that task at another module or command in ssmlab.yaml.

3. Low-Level CLIs Still Exist

If you want to bypass the YAML layer entirely, the low-level entry points are still available:

uv run ssmlab-mamba-train \
  --architecture mamba3 \
  --tokenizer byte \
  --mimo-rank 2 \
  --d-model 160 \
  --n-layers 5 \
  --head-dim 16 \
  --state-dim 16 \
  --init-checkpoint runs/tinystories_mamba3_1m/best.pt \
  --output-dir runs/spo_mamba3_1m_ft

4. Run Inference On The Demo Task

The top-level YAML CLI now supports inference too:

uv run ssmlab infer --model mamba3-1b-a1b2 --target local -- --text "Alice reads a book and Bob drives a car."

This resolves tasks.infer from ssmlab.yaml, which points at the model's fine-tuned checkpoint by default.

The low-level CLI still works if you want to pass the checkpoint explicitly:

uv run ssmlab-mamba-infer \
  --checkpoint runs/spo_mamba3_1m_ft/best.pt \
  --text "Alice reads a book and Bob drives a car."

Example output:

raw: [(alice, book, reads), (bob, car, drives)]
triples: [('alice', 'book', 'reads'), ('bob', 'car', 'drives')]

Important Notes

  • mamba2 requires --mimo-rank 1.
  • TinyStories pretraining is configured around the byte tokenizer.
  • Fine-tuning can use either tokenizer, but --init-checkpoint requires the tokenizer and model shapes to match exactly.
  • The implementation is intentionally readable and CPU-friendly, not optimized.
  • The current downstream task is synthetic and narrow by design.

If You Want To Experiment

Good first experiments:

  • compare mamba2 vs mamba3
  • vary state_dim
  • vary head_dim
  • vary mimo_rank for mamba3
  • pretrain first, then fine-tune

Those experiments are more aligned with the purpose of the repo than the specific SPO benchmark itself.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages