DocsDeployment APIQuick Start

Quick Start

Get started with the OpenAI-compatible Deployment API in under 5 minutes.

Navigation

Quick Start

Every Macyou deployment exposes an OpenAI-compatible API — same endpoints, same request/response format, but powered by your own local model (Llama, Mistral, Qwen, etc.) running on dedicated Apple Silicon. No data is sent to OpenAI or any third party. Point any OpenAI SDK, LangChain, or LlamaIndex client at your deployment and it just works — change the base URL and API key, keep everything else.

1. Create a Deployment

From the Deployments dashboard, choose a template (e.g. Llama 4 Scout, Mistral Small) and click Deploy. You'll get an endpoint URL and an API key shown once.

2. Make Your First Request

curl https://your-deployment.macyou.co/api/v1/chat/completions \
  -H "Authorization: Bearer mcy_live_YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama-4-scout",
    "messages": [
      {"role": "user", "content": "Explain quantum computing in one paragraph."}
    ]
  }'

3. Use with the OpenAI SDK

from openai import OpenAI

client = OpenAI(
    base_url="https://your-deployment.macyou.co/api/v1",
    api_key="mcy_live_YOUR_KEY",
)

response = client.chat.completions.create(
    model="llama-4-scout",
    messages=[{"role": "user", "content": "Hello!"}],
)
print(response.choices[0].message.content)

4. Streaming

stream = client.chat.completions.create(
    model="llama-4-scout",
    messages=[{"role": "user", "content": "Write a haiku about servers."}],
    stream=True,
)
for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")