DocsDeployment APIChat Completions

Chat Completions

Create chat completions with streaming support. POST /api/v1/chat/completions.

Navigation

Chat Completions

POST /api/v1/chat/completions

Creates a model response for the given conversation. Compatible with the OpenAI Chat Completions API.

Request Body

Parameter	Type	Required	Description
`model`	string	Yes	Model ID (use `/v1/models` to list available models)
`messages`	array	Yes	List of messages in the conversation
`stream`	boolean	No	If true, returns a stream of server-sent events
`temperature`	number	No	Sampling temperature (0-2). Default: 1
`max_tokens`	integer	No	Maximum tokens to generate
`top_p`	number	No	Nucleus sampling probability (0-1)
`stop`	string \| array	No	Stop sequences
`presence_penalty`	number	No	Presence penalty (-2 to 2)
`frequency_penalty`	number	No	Frequency penalty (-2 to 2)

Message Object

Field	Type	Description
`role`	string	`system`, `user`, `assistant`, or `tool`
`content`	string \| null	The message content

Non-Streaming Response

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1714000000,
  "model": "llama-4-scout",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Quantum computing uses..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 12,
    "completion_tokens": 45,
    "total_tokens": 57
  }
}

Streaming Response

When stream: true, the response is a stream of server-sent events. Each event has a data: prefix followed by a JSON chunk. The stream ends with data: [DONE].

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"Hello"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}

data: [DONE]

← Previous

Authentication

Completions