Chat Completions
Create chat completions with streaming support. POST /api/v1/chat/completions.
Navigation
Chat Completions
POST /api/v1/chat/completionsCreates a model response for the given conversation. Compatible with the OpenAI Chat Completions API.
Request Body
| Parameter | Type | Required | Description |
|---|---|---|---|
model | string | Yes | Model ID (use /v1/models to list available models) |
messages | array | Yes | List of messages in the conversation |
stream | boolean | No | If true, returns a stream of server-sent events |
temperature | number | No | Sampling temperature (0-2). Default: 1 |
max_tokens | integer | No | Maximum tokens to generate |
top_p | number | No | Nucleus sampling probability (0-1) |
stop | string | array | No | Stop sequences |
presence_penalty | number | No | Presence penalty (-2 to 2) |
frequency_penalty | number | No | Frequency penalty (-2 to 2) |
Message Object
| Field | Type | Description |
|---|---|---|
role | string | system, user, assistant, or tool |
content | string | null | The message content |
Non-Streaming Response
{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1714000000,
"model": "llama-4-scout",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Quantum computing uses..."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 12,
"completion_tokens": 45,
"total_tokens": 57
}
}Streaming Response
When stream: true, the response is a stream of server-sent events. Each event has a data: prefix followed by a JSON chunk. The stream ends with data: [DONE].
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"Hello"},"finish_reason":null}]}
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}
data: [DONE]