Small libraries, sharp edges
Building the stack from scratch in Mojo meant writing the pieces that didn't exist yet. Each is a small, single-purpose library — usable on its own, and together the substrate under the server.
A minimal implementation of Jinja2 in Mojo — used to render the Qwen2.5 chat template byte-identically to transformers.apply_chat_template.
A Mojo zlib binding (inflate/deflate) via a thin C shim to libz. Powers PDF /FlateDecode in pdftotext.mojo and general decompression.
A pure-Mojo PDF text extractor — object map, page-tree walk, content-stream text ops, and /ToUnicode CMap decoding. Pulls clean text out of PDFs for document pipelines.
A Mojo binding for the LanceDB vector store via a small Rust cdylib over a C ABI. An on-device vector index for embedding search.
A from-scratch RFC-4180 CSV parser in Mojo — quoted fields, embedded commas/newlines, escaped quotes, UTF-8-safe. General-purpose CSV parsing.
Networking: flare
The HTTP and TLS under the inference server is
flare, a full
networking stack for Mojo (HTTP/1.1 and HTTP/2, WebSocket, TLS, TCP/UDP)
on a single non-blocking reactor. We build on flare and maintain a
fork at
millrace/flare to track the nightly Mojo dialect and
the FFI fixes our shims depend on, upstreaming where we can.