millrace

Local-first LLM inference on Apple Silicon.

OpenAI and Anthropic-compatible APIs, written from scratch in Mojo — every GPU kernel custom-written, no C++, no CUDA, no Metal shaders.

The whole stack is Mojo: the inference engine and the small libraries underneath — every GPU kernel custom-written and compiled on your Mac, so your data never leaves the machine.

Get started How it works View on GitHub

the server engine

From-scratch, pure-Mojo GPU inference engine for Qwen2.5 on Apple Silicon — every kernel custom-written, no C++/CUDA/Metal deps. Speaks an OpenAI-compatible API, so you can code against it locally with opencode.

app macOS app

The Millrace macOS app — a menu-bar companion plus a millrace CLI, installable with Homebrew (brew install millrace/tap/millrace). One-click bootstrap of the whole stack: fetch, build, download weights, serve, launch opencode.

tools libraries

The small, single-purpose Mojo libraries we built along the way — chat templating, compression, PDF extraction, an on-device vector store — plus the flare networking stack underneath.

dacular example ↗

An example of what you can build on the server: a private document vault. Index your own PDFs, CSVs, and notes and ask open-ended questions — answered locally, the data never leaving the machine. Lives at dacular.app.