OpenAI-compatible Chat Completions API (with field-level annotations and streaming SSE details)
| Item | Value |
|---|---|
| Method | POST |
| Path | /v1/chat/completions |
| Authentication | Authorization: Bearer <API_KEY> |
| Content-Type | application/json |
| Response (non-streaming) | application/json |
| Response (streaming) | text/event-stream (SSE) |
model and messages. Use stream=true to receive SSE chunks.
Supported models:surf-ask,surf-research,surf-1.5,surf-1.5-instant,surf-1.5-thinking.
surf-askandsurf-researchare legacy models.surf-1.5,surf-1.5-instant, andsurf-1.5-thinkingare the new models. Legacy models remain available and the request format is unchanged.
Note: When usingsurf-researchandsurf-1.5, it is recommended to set the timeout to 10 minutes.
surf-1.5surf-1.5 is the recommended, next-generation model. Compared to legacy models, it’s designed for more advanced workflows:
surf-1.5-instant, providing faster and more accurate results compared to legacy models.tools and orchestrating multi-step tool-augmented tasks.reasoning_effort (low / medium / high) to trade off speed vs. deeper analysis.ability (capability constraints) and citation (citation formats) in the same request shape.surf-1.5, surf-1.5-instant, surf-1.5-thinkingsurf-1.5 family includes three model variants with different reasoning capabilities:
surf-1.5-instant: A lightweight model optimized for fast responses to simple queries.
surf-1.5-thinking: A more powerful model with deeper reasoning capabilities, designed to handle complex problems that require thorough analysis and multi-step reasoning.
surf-1.5: An adaptive model that automatically selects between surf-1.5-instant and surf-1.5-thinking based on the request parameters and problem complexity. This provides an optimal balance between speed and depth without requiring manual model selection.
model, messages, optional stream) and standard Authorization: Bearer <API_KEY>.stream=true to receive incremental chunks (text/event-stream) and terminate on data: [DONE].tools (OpenAI function calling). The model may request tool executions during generation.reasoning_effort: low / medium / high.ability to constrain available capability domains and citation to request output citation formats.400, 401, or 502 (both streaming and non-streaming).model.CompletionsRequest)Note: This endpoint follows the overall structure of OpenAIchat.completions, and additionally provides Surf extension fields such asabilityandcitation.
| Field | Type | Required | Description | Example |
|---|---|---|---|---|
model | string (enum) | Yes | Model identifier to use. | "surf-ask" / "surf-research" / "surf-1.5" / "surf-1.5-instant" / "surf-1.5-thinking" |
messages | array<object> | Yes | List of chat messages. | Minimal: [{"role":"user","content":"Hello!"}] · With system: [{"role":"system","content":"You are a helpful assistant."},{"role":"user","content":"Hello!"}] |
stream | boolean | No | Enable streaming output. | true → SSE (text/event-stream) · false (default) → JSON |
reasoning_effort | string (enum) | No | Reasoning strength. | "low" / "medium" / "high" |
ability | array<string> (enum) | No | Surf extension: hints/constraints for which capability domains are available for this request. | ["search"] / ["evm_onchain"] / ["solana_onchain"] / ["market_analysis"] / ["calculate"] |
citation | array<string> (enum) | No | Surf extension: citation formats to include in the output. | ["source"] / ["chart"] |
tools | array<object> | No | Tool definitions compatible with OpenAI tools/function calling. The model may request tool calls during generation. | [{"type":"function","function":{"name":"get_weather","description":"Get current weather","parameters":{"type":"object","properties":{"city":{"type":"string"}},"required":["city"]}}}] |
messages[] (model.Message)| Field | Type | Required | Description (Notes) |
|---|---|---|---|
role | string (enum) | Yes | Message role: system (system instructions) / user (user input) / assistant (model output). |
content | string | Yes | Message text content. system is used for rules/boundaries; user is used for questions/tasks; you typically do not need to include assistant messages in the request. |
tools[] (model.Tool / model.ToolFunction)| Field | Type | Required | Description (Notes) |
|---|---|---|---|
type | string (enum) | Yes | Tool type. Currently function. |
function | object | Yes | Function tool definition. |
function object:
| Field | Type | Required | Description (Notes) |
|---|---|---|---|
name | string | Yes | Tool name (function name). Prefer snake_case. |
description | string | No | Tool purpose description, to help the model decide whether to call it. |
parameters | object (JSON Schema) | No | JSON Schema describing tool parameters (structure/types/required fields). |
model.CompletionsProxyResponse)| Field | Type | Description (Notes) |
|---|---|---|
id | string | Unique identifier for this completion (e.g., chatcmpl-...). |
object | string | Object type, typically chat.completion. |
created | integer | Creation timestamp (seconds). |
model | string | Model identifier actually used. |
choices | array<object> | List of generated results (usually only 1). |
usage | object | Token usage statistics. |
choices[] (model.CompletionsChoice)| Field | Type | Description (Notes) |
|---|---|---|
index | integer | Choice index (starting from 0). |
finish_reason | string | null | Finish reason. Common values: stop (normal completion), length (reached token limit), tool_calls (triggered tool calls), content_filter (blocked by safety policy), error (aborted due to error). |
message | object | Final message (assistant). |
message (model.CompletionsMessage)| Field | Type | Description (Notes) |
|---|---|---|
role | string | Role, typically assistant. |
content | string | Final model output text. |
reasoning | string | (If returned) field used to expose model reasoning/rationale. Note: depending on product policy, this may be omitted or simplified. |
usage (model.CompletionsUsage)| Field | Type | Description (Notes) |
|---|---|---|
prompt_tokens | integer | Input tokens. |
completion_tokens | integer | Output tokens. |
total_tokens | integer | Total tokens (input + output). |
text/event-stream)stream=true, the server continuously streams SSE events. Each event block typically looks like:
Note: The current OpenAPI spec does not define a dedicated schema for streaming chunks. In the examples, each chunk’sobjectis typicallychat.completion.chunk, and incremental output is delivered viachoices[].delta(e.g.,role/content).finish_reasonis usuallynulluntil the stream ends.
model.BaseResponse)400, 401, 502 (both streaming and non-streaming).
| Field | Type | Description (Notes) |
|---|---|---|
success | boolean | Whether the request succeeded. |
message | string | Error message / hint. |
error_code | string | Error code (e.g., FORBIDDEN). |
Request body (OpenAI-compatible)
search, evm_onchain, solana_onchain, market_analysis, calculate [
"[\"evm_onchain\"",
"\"market_analysis\"",
"\"calculate\"]"
]source, chart ["[\"source\"", "\"chart\"]"]surf-ask, surf-research, surf-1.5 "surf-ask"
low, medium, high "medium"
false
Returns JSON when stream=false and SSE chunks when stream=true.