I finally found a local LLM I actually want to use for coding

🟢 READ | ⏱ 8 min | 📡 7/10 | 🎯 Developers, security researchers, LLM enthusiasts

TL;DR

After running local LLMs as a principle-driven exercise for years, Adam Conway found Qwen3-Coder-Next genuinely practical for daily coding and security research. The secret: hardware (128GB unified VRAM), smart architecture (sparse MoE), and seamless Claude Code integration via a local vLLM endpoint.

Signal

Hardware: Lenovo ThinkStation PGX ($3K) with NVIDIA GB10 Grace Blackwell Superchip; 128GB unified LPDDR5x memory eliminates PCIe bottleneck that plagues discrete GPUs.
Model: Qwen3-Coder-Next (80B params, 3B active via MoE), runs at ~46GB (Q4_K_M) or ~85GB (Q8_0); supports 256K native context, achieves 170K usable via Gated DeltaNet hybrid attention (75% linear + 25% full).
Setup: Docker + vLLM on DGX OS; Claude Code points to local endpoint via env vars (no proxy needed); zero API latency, rate limits, or usage costs.

What They're NOT Telling You

The "$3,000 hardware" barrier makes this inaccessible to most hobbyists, though the author mentions Qwen3-Coder-Next can run on lower VRAM systems via MoE offloading (16-24GB). Privacy benefits are framed as "nice-to-have" here, but for anyone handling NDAs, binaries, or sensitive reverse engineering, local execution isn't optional—it's compliance.

Trust Check

Factuality ✅ | Author Authority ✅ | Actionability ✅