Skip to main content
POST
https://api.firmware.ai
/
v1
/
chat
/
completions
Chat completions
curl --request POST \
  --url https://api.firmware.ai/v1/chat/completions \
  --header 'Content-Type: application/json' \
  --data '
{
  "model": "<string>",
  "messages": [
    {}
  ],
  "stream": true,
  "temperature": 123,
  "max_tokens": 123,
  "tools": [
    {}
  ],
  "tool_choice": {},
  "mcp_servers": [
    {}
  ],
  "thinking": {},
  "reasoning_effort": "<string>",
  "generation_config": {},
  "safety_settings": [
    {}
  ]
}
'
{
  "id": "chatcmpl-123",
  "object": "chat.completion",
  "created": 1677652288,
  "model": "gpt-4o",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The capital of France is Paris."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 12,
    "completion_tokens": 8,
    "total_tokens": 20
  }
}
Create a chat completion. Supports streaming, tool calling, and MCP server integration across all providers.

Request body

model
string
required
Model ID to use for completion. See available models for the full list.
messages
array
required
Conversation history as an array of message objects.
[
  { "role": "system", "content": "You are a helpful assistant." },
  { "role": "user", "content": "Hello!" }
]
stream
boolean
default:"false"
Enable Server-Sent Events streaming for the response.
temperature
number
Sampling temperature between 0 and 2. Higher values make output more random.
max_tokens
integer
Maximum tokens to generate in the completion.
tools
array
List of tools the model may call. Supports OpenAI function tools.
[
  {
    "type": "function",
    "function": {
      "name": "get_weather",
      "description": "Get current weather for a location",
      "parameters": {
        "type": "object",
        "properties": {
          "location": { "type": "string" }
        },
        "required": ["location"]
      }
    }
  }
]
tool_choice
string | object
Controls tool calling. Options: auto, none, required, or a specific tool.
mcp_servers
array
MCP server addresses for server-side tool execution.
["dedalus-labs/brave-search", "dedalus-labs/github-api"]
thinking
object
Extended thinking configuration for Anthropic models.
{ "type": "enabled", "budget_tokens": 2048 }
reasoning_effort
string
Constrains effort on reasoning for supported reasoning models. Higher values use more compute, improving quality at the cost of latency and tokens.Options: low, medium, high
"medium"
generation_config
object
Google generationConfig object. Merged with auto-generated config. Use for Google-specific params like candidateCount or responseMimeType.
{
  "candidateCount": 2,
  "responseMimeType": "application/json"
}
safety_settings
array
Google safety settings for harm categories and thresholds.
[
  {
    "category": "HARM_CATEGORY_HARASSMENT",
    "threshold": "BLOCK_NONE"
  }
]

Response

id
string
Unique identifier for the completion.
object
string
Always chat.completion.
created
integer
Unix timestamp of when the completion was created.
model
string
The model used for the completion.
choices
array
Array of completion choices.
usage
object
Token usage statistics.
tools_executed
array
List of MCP tools executed server-side, if any.

Examples

curl https://api.firmware.ai/v1/chat/completions \
  -H "Authorization: Bearer $FIRMWARE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [
      {"role": "user", "content": "What is the capital of France?"}
    ]
  }'
{
  "id": "chatcmpl-123",
  "object": "chat.completion",
  "created": 1677652288,
  "model": "gpt-4o",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The capital of France is Paris."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 12,
    "completion_tokens": 8,
    "total_tokens": 20
  }
}