Back to catalog
🔬
Local LLM Deployments
DeepSeek V3
DeepSeek V3 is a frontier-scale model with over 120 billion active parameters, built for deep reasoning, complex code generation, and mathematical problem-solving. Quantized to fit 128 GB unified memory on the Max tier.
Max+ requiredfrom $1999/mo
15 min provisioning
OpenAI-compatible APIMade by DeepSeek
License: DeepSeek License
Technical Specifications
Tap the icon next to any term for a plain-language explanation.
Model size120B+ parameters
Memory required128 GB
Speed (M4 Pro)~5 tok/s
QuantizationQ4_K_M
Context window131K tokens
Disk space70 GB
RuntimeOllama + MLX
Use Cases
- Advanced mathematics
- Scientific reasoning
- Complex code generation
- Multi-step logic problems
- Research and analysis
What you get
- Ollama runtime with DeepSeek V3 (Q4 quantized)
- MLX backend for optimized inference
- OpenAI-compatible API endpoint
- Prometheus metrics
Start using it
curl
curl https://dep-<id>.macyou.cloud/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer mcy_live_<your-key>" \
-d '{
"model": "deepseek-v3",
"messages": [{"role": "user", "content": "Hello!"}],
"stream": true
}'Python (OpenAI SDK)
from openai import OpenAI
client = OpenAI(
api_key="mcy_live_<your-key>",
base_url="https://dep-<id>.macyou.cloud/v1"
)
response = client.chat.completions.create(
model="deepseek-v3",
messages=[{"role": "user", "content": "Hello!"}],
stream=True
)
for chunk in response:
print(chunk.choices[0].delta.content or "", end="")Tags
LLMFrontierReasoningCode