DocsDeployment APIChat Completions

Chat Completions

Create chat completions with streaming support. POST /api/v1/chat/completions.

Navigation

Chat Completions

POST /api/v1/chat/completions

Creates a model response for the given conversation. Compatible with the OpenAI Chat Completions API.

Request Body

ParameterTypeRequiredDescription
modelstringYesModel ID (use /v1/models to list available models)
messagesarrayYesList of messages in the conversation
streambooleanNoIf true, returns a stream of server-sent events
temperaturenumberNoSampling temperature (0-2). Default: 1
max_tokensintegerNoMaximum tokens to generate
top_pnumberNoNucleus sampling probability (0-1)
stopstring | arrayNoStop sequences
presence_penaltynumberNoPresence penalty (-2 to 2)
frequency_penaltynumberNoFrequency penalty (-2 to 2)

Message Object

FieldTypeDescription
rolestringsystem, user, assistant, or tool
contentstring | nullThe message content

Non-Streaming Response

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1714000000,
  "model": "llama-4-scout",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Quantum computing uses..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 12,
    "completion_tokens": 45,
    "total_tokens": 57
  }
}

Streaming Response

When stream: true, the response is a stream of server-sent events. Each event has a data: prefix followed by a JSON chunk. The stream ends with data: [DONE].

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"Hello"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}

data: [DONE]