Back to catalog
๐
Local LLM Deployments
Qwen 2.5 32B
Alibaba's Qwen 2.5 32B is a powerful multilingual model with strong coding and reasoning capabilities. With 32 billion parameters it handles complex tasks that smaller models struggle with, including code generation in 20+ programming languages and fluent communication in Chinese, English, Japanese, Korean, and more.
Standard+ requiredfrom $299/mo
5 min provisioning
OpenAI-compatible APIMade by Alibaba Cloud
License: Apache 2.0
Technical Specifications
Tap the icon next to any term for a plain-language explanation.
Model size32B parameters
Memory required32 GB
Speed (M4 Pro)~28 tok/s
QuantizationQ4_K_M
Context window33K tokens
Disk space18 GB
RuntimeOllama + MLX
Use Cases
- Multilingual customer support
- Code generation and review
- Document analysis
- Research assistance
- Translation
What you get
- Ollama runtime with Qwen 2.5 32B pre-loaded
- MLX backend for optimized inference
- OpenAI-compatible API endpoint
- Prometheus metrics
Start using it
curl
curl https://dep-<id>.macyou.cloud/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer mcy_live_<your-key>" \
-d '{
"model": "qwen-2.5-32b",
"messages": [{"role": "user", "content": "Hello!"}],
"stream": true
}'Python (OpenAI SDK)
from openai import OpenAI
client = OpenAI(
api_key="mcy_live_<your-key>",
base_url="https://dep-<id>.macyou.cloud/v1"
)
response = client.chat.completions.create(
model="qwen-2.5-32b",
messages=[{"role": "user", "content": "Hello!"}],
stream=True
)
for chunk in response:
print(chunk.choices[0].delta.content or "", end="")Tags
LLMMultilingualCode