Quick Start
Get started with the OpenAI-compatible Deployment API in under 5 minutes.
Navigation
Quick Start
Every Macyou deployment exposes an OpenAI-compatible API — same endpoints, same request/response format, but powered by your own local model (Llama, Mistral, Qwen, etc.) running on dedicated Apple Silicon. No data is sent to OpenAI or any third party. Point any OpenAI SDK, LangChain, or LlamaIndex client at your deployment and it just works — change the base URL and API key, keep everything else.
1. Create a Deployment
From the Deployments dashboard, choose a template (e.g. Llama 4 Scout, Mistral Small) and click Deploy. You'll get an endpoint URL and an API key shown once.
2. Make Your First Request
curl https://your-deployment.macyou.co/api/v1/chat/completions \
-H "Authorization: Bearer mcy_live_YOUR_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "llama-4-scout",
"messages": [
{"role": "user", "content": "Explain quantum computing in one paragraph."}
]
}'3. Use with the OpenAI SDK
from openai import OpenAI
client = OpenAI(
base_url="https://your-deployment.macyou.co/api/v1",
api_key="mcy_live_YOUR_KEY",
)
response = client.chat.completions.create(
model="llama-4-scout",
messages=[{"role": "user", "content": "Hello!"}],
)
print(response.choices[0].message.content)4. Streaming
stream = client.chat.completions.create(
model="llama-4-scout",
messages=[{"role": "user", "content": "Write a haiku about servers."}],
stream=True,
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="")