DeepSeek V3

DeepSeek V3 is a frontier-scale Mixture-of-Experts model: 671 billion total parameters with 37 billion active per token, built for deep reasoning, complex code generation, and mathematical problem-solving. All expert weights must sit in memory, so a Q4 build needs roughly 400 GB — on Apple Silicon that means a Thunderbolt 5 cluster of M3 Ultra nodes pooling unified memory.

Max+ requiredfrom $1999/mo

15 min provisioning

OpenAI-compatible API

Made by DeepSeek

License: DeepSeek License

Technical Specifications

Tap the icon next to any term for a plain-language explanation.

Model size671B MoE (37B active) parameters

Memory required384 GB

QuantizationQ4_K_M

Context window131K tokens

Disk space385 GB

RuntimeOllama + MLX

Use Cases

Advanced mathematics
Scientific reasoning
Complex code generation
Multi-step logic problems
Research and analysis

What you get

Ollama runtime with DeepSeek V3 (Q4 quantized)
MLX backend for optimized inference
OpenAI-compatible API endpoint
Prometheus metrics

Start using it

curl

curl https://dep-<id>.macyou.cloud/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer mcy_live_<your-key>" \
  -d '{
    "model": "deepseek-v3",
    "messages": [{"role": "user", "content": "Hello!"}],
    "stream": true
  }'

Python (OpenAI SDK)

from openai import OpenAI

client = OpenAI(
    api_key="mcy_live_<your-key>",
    base_url="https://dep-<id>.macyou.cloud/v1"
)

response = client.chat.completions.create(
    model="deepseek-v3",
    messages=[{"role": "user", "content": "Hello!"}],
    stream=True
)

for chunk in response:
    print(chunk.choices[0].delta.content or "", end="")

Ready to deploy DeepSeek V3?

Up and running in 15 minutes on dedicated Apple Silicon.