TutorialLLMDeepSeek120BFrontierReasoningApple Silicon

DeepSeek V3 on Apple Silicon: What a 671B MoE Actually Requires

March 12, 20268 min readby Macyou Team

DeepSeek V3 is a 671-billion-parameter Mixture-of-Experts model with 37B active per token — one of the strongest open models ever released. Because the router can call any expert at any moment, all 671B parameters must sit in memory: a Q4 build is roughly 400 GB. No single Mac holds that — and we'd rather tell you that plainly than sell you a machine that can't run it.

What it actually takes

The realistic Apple Silicon configuration is a Thunderbolt 5 cluster of two Mac Studio M3 Ultra 256 GB nodes pooling unified memory (~512 GB effective). That covers Q4 with headroom for the 128K context. Q8 (~713 GB weights) wants a larger cluster still. Full quantization table is in the DeepSeek V3 hardware guide.

Why MoE is fast once loaded

Only 37B parameters activate per token, so generation speed resembles a ~40B dense model rather than a 671B one — that is the entire appeal of MoE. Memory sets the entry price; compute per token stays modest. On pooled unified memory the bottleneck is inter-node bandwidth, which is exactly what Thunderbolt 5's 120 Gbps links are for.

The single-Mac alternatives

Most workloads don't need the full V3. The R1 Distill family carries DeepSeek's reasoning style into dense models that fit one machine: the 8B runs on a 16 GB M4 — we measured 20.0 tokens/sec median — and the 70B distill fits a 64 GB M4 Pro at Q4. See our R1 Distill 70B guide for that setup.

Deploy

Cluster configurations are built per request — talk to us if you need the full V3 privately. For the distills, the catalog has one-click stacks with an OpenAI-compatible API on hardware from $99/mo.

All posts