TUNDRA // NEXUS

Mission Control
Curated Links/2026-05-09-ds4-deepseek-v4-flash
🟢

ds4.c — DeepSeek V4 Flash Local Inference Engine for Metal

🔗github.com/antirez
May 9, 2026
SIGNAL8/10
#dev #infrastructure #ai

🟢 READ | ⏱ 8 min | 📡 8/10 | 🎯 ML engineers, local inference builders

TL;DR

ds4.c is a streamlined native inference engine for DeepSeek V4 Flash optimized for Apple Silicon. It achieves impressive performance (84–468 tokens/sec prefill on M3 Ultra) through specialized 2-bit quantization and compressed KV caching, enabling 1M token context on 128GB Macs without cloud dependency.

Signal

  • Custom 2-bit quantization strategy (only routed MoE experts compressed; shared experts, projections untouched for quality)
  • Compressed KV cache with disk persistence allows long-context inference on local machines
  • 1M token context window with official logits validation + HTTP-compatible APIs (OpenAI & Anthropic formats)

What They're NOT Telling You

The project explicitly admits it's "alpha quality" and GPU/CUDA support is uncertain ("may implement… perhaps, but nothing more"). CPU inference is broken on modern macOS due to a kernel virtual memory bug they couldn't fix. This is optimized for a single use case—don't expect generic GGUF loading or broad model support.

Trust Check

Factuality ✅ | Author Authority ✅ | Actionability ✅