Back to catalog
๐
Multi-Model Orchestration
LiteLLM Multi-Model
LiteLLM provides a unified OpenAI-compatible API that routes requests across multiple local models. Load Llama, Qwen, and Mistral simultaneously โ the proxy handles model selection, fallback, and load balancing. One endpoint, many models.
Advanced+ requiredfrom $599/mo
10 min provisioning
OpenAI-compatible APIMade by BerriAI
License: MIT
Technical Specifications
Tap the icon next to any term for a plain-language explanation.
Memory required64 GB
RuntimeLiteLLM + Ollama
Use Cases
- A/B testing across models
- Model fallback and redundancy
- Cost optimization via model routing
- Multi-model API gateway
- Unified API for multiple teams
What you get
- LiteLLM proxy server
- Pre-configured model routing
- OpenAI-compatible unified endpoint
- Model fallback and load balancing
- Usage tracking per model
Start using it
curl
curl https://dep-<id>.macyou.cloud/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer mcy_live_<your-key>" \
-d '{
"model": "litellm-proxy",
"messages": [{"role": "user", "content": "Hello!"}],
"stream": true
}'Python (OpenAI SDK)
from openai import OpenAI
client = OpenAI(
api_key="mcy_live_<your-key>",
base_url="https://dep-<id>.macyou.cloud/v1"
)
response = client.chat.completions.create(
model="litellm-proxy",
messages=[{"role": "user", "content": "Hello!"}],
stream=True
)
for chunk in response:
print(chunk.choices[0].delta.content or "", end="")Tags
Multi-ModelLiteLLMOrchestration