Introducing nb.dev: A Cracked Jupyter Notebook for ML

Jupyter notebooks have been an integral part of engineering workflows since their inception. But for ML, they're broken. Reproducibility issues, flaky kernels, and no ability to scale experiments — every ML engineer knows the pain. We built nb.dev to fix this.

What is nb.dev?

nb.dev is an ML-first notebook that introduces three capabilities that change how you experiment:

GPU Attach — Instantly attach a GPU to any notebook
Machine Snapshots — Capture your entire machine state: everything in GPU memory, RAM, model weights, optimizer state, and data streams
One-Click Branching — Branch your notebook onto as many new machines as you want, each resuming from the exact snapshot

Unlike traditional checkpoint tools, our snapshots are async and faster, capturing the full machine state so you never lose context mid-experiment. And because branching is so easy, iteration is dramatically faster.

The Problem: Sequential Experimentation

Consider a typical ML workflow. You're finetuning a model and hit a loss plateau. You have three hypotheses for how to push past it — maybe adjust the learning rate, try a different schedule, or add some regularization.

Traditionally, you'd have to test these one at a time. Restart from a checkpoint, wait for training to converge (or not), then try the next idea. Each attempt takes the full training time, and you're blocked while it runs.

This sequential bottleneck is one of the biggest time sinks in ML research.

The Solution: Branch and Run in Parallel

With nb.dev, when you hit that inflection point, you snapshot the entire machine state and branch your notebook onto multiple machines — each one picking up exactly where you left off.

Here's what that looks like in practice with a Qwen3 finetune:

1. Train to the Inflection Point

Set up your training configuration and run until you see the loss flatten. In our demo, we trained with an initial learning rate of 0.02 and checked in at step 500, where the loss plateaued at around 7.

2. Snapshot and Branch

Instead of guessing which hypothesis to try first, we snapshot the full machine state and branch the notebook twice — giving us three identical copies of the experiment, each with the model weights, optimizer state, and data stream fully intact.

3. Test Hypotheses in Parallel

Each branch gets a different experiment:

Branch A: Keep the learning rate the same
Branch B: Decrease the learning rate significantly
Branch C: Add cosine decay scheduling

All three run simultaneously on separate machines. No restarting from checkpoints. No re-running the first 500 steps. Each branch resumes instantly from the exact state where you left off.

4. Compare Results

When all branches finish, you compare the training curves side by side and pick the winner. What would have taken 3x the time sequentially now takes 1x. And you have full confidence that each experiment started from an identical state.

Why This Matters

This isn't just a faster notebook. It's a paradigm shift in how people experiment with ML.

Faster iteration: Launch many experiments in parallel with minimal overhead
Perfect reproducibility: Every branch starts from the exact same machine state — no checkpoint loading bugs, no environment drift
Team collaboration: Find a version of your notebook you like? Share it with someone on your team instantly
Agent-ready: We're engineering this to make it easy for autonomous research agents to branch and quickly make discoveries

Get Started

Try it today at nb.dev.