Chat completions

curl --request POST \
  --url https://app.firmware.ai/api/v1/chat/completions \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "model": "<string>",
  "messages": [
    {}
  ],
  "stream": true,
  "temperature": 123,
  "max_tokens": 123,
  "tools": [
    {}
  ],
  "tool_choice": {},
  "mcp_servers": [
    {}
  ],
  "thinking": {},
  "reasoning_effort": "<string>",
  "generation_config": {},
  "safety_settings": [
    {}
  ]
}
'

{
  "id": "chatcmpl-123",
  "object": "chat.completion",
  "created": 1677652288,
  "model": "gpt-4o",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The capital of France is Paris."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 12,
    "completion_tokens": 8,
    "total_tokens": 20
  }
}

POST

https://app.firmware.ai

api

chat

completions

Chat completions

curl --request POST \
  --url https://app.firmware.ai/api/v1/chat/completions \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "model": "<string>",
  "messages": [
    {}
  ],
  "stream": true,
  "temperature": 123,
  "max_tokens": 123,
  "tools": [
    {}
  ],
  "tool_choice": {},
  "mcp_servers": [
    {}
  ],
  "thinking": {},
  "reasoning_effort": "<string>",
  "generation_config": {},
  "safety_settings": [
    {}
  ]
}
'

{
  "id": "chatcmpl-123",
  "object": "chat.completion",
  "created": 1677652288,
  "model": "gpt-4o",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The capital of France is Paris."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 12,
    "completion_tokens": 8,
    "total_tokens": 20
  }
}

Create a chat completion. Supports streaming, tool calling, and MCP server integration across all providers.

Authenticate

Use an API key in the Authorization header.

curl https://app.firmware.ai/api/v1/chat/completions \
  -H "Authorization: Bearer $FIRMWARE_API_KEY"

Request body

model

string

required

Model ID to use for completion. See available chat models for the full list.

messages

array

required

Conversation history as an array of message objects.

[
  { "role": "system", "content": "You are a helpful assistant." },
  { "role": "user", "content": "Hello!" }
]

stream

boolean

default:"false"

Enable Server-Sent Events streaming for the response.

temperature

number

Sampling temperature between 0 and 2. Higher values make output more random.

max_tokens

integer

Maximum tokens to generate in the completion.

tools

array

List of tools the model may call. Supports OpenAI function tools.

[
  {
    "type": "function",
    "function": {
      "name": "get_weather",
      "description": "Get current weather for a location",
      "parameters": {
        "type": "object",
        "properties": {
          "location": { "type": "string" }
        },
        "required": ["location"]
      }
    }
  }
]

tool_choice

string | object

Controls tool calling. Options: auto, none, required, or a specific tool.

mcp_servers

array

MCP server addresses for server-side tool execution.

["dedalus-labs/brave-search", "dedalus-labs/github-api"]

thinking

object

Extended thinking configuration for Anthropic models.

{ "type": "enabled", "budget_tokens": 2048 }

reasoning_effort

string

Constrains effort on reasoning for supported reasoning models. Higher values use more compute, improving quality at the cost of latency and tokens.Options: low, medium, high

"medium"

generation_config

object

Google generationConfig object. Merged with auto-generated config. Use for Google-specific params like candidateCount or responseMimeType.

{
  "candidateCount": 2,
  "responseMimeType": "application/json"
}

safety_settings

array

Google safety settings for harm categories and thresholds.

[
  {
    "category": "HARM_CATEGORY_HARASSMENT",
    "threshold": "BLOCK_NONE"
  }
]

Response

string

Unique identifier for the completion.

object

string

Always chat.completion.

created

integer

Unix timestamp of when the completion was created.

model

string

The model used for the completion.

choices

array

Array of completion choices.

Show Choice object

index

integer

Index of the choice.

message

object

The generated message with role and content.

finish_reason

string

Why generation stopped: stop, length, tool_calls, or content_filter.

usage

object

Token usage statistics.

Show Usage object

prompt_tokens

integer

Tokens in the prompt.

completion_tokens

integer

Tokens in the completion.

total_tokens

integer

Total tokens used.

tools_executed

array

List of MCP tools executed server-side, if any.

Examples

curl https://app.firmware.ai/api/v1/chat/completions \
  -H "Authorization: Bearer $FIRMWARE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [
      {"role": "user", "content": "What is the capital of France?"}
    ]
  }'

{
  "id": "chatcmpl-123",
  "object": "chat.completion",
  "created": 1677652288,
  "model": "gpt-4o",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The capital of France is Paris."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 12,
    "completion_tokens": 8,
    "total_tokens": 20
  }
}

API reference

Messages

Inference

Deep Research

Chat completions

Authenticate

Request body

Response

Examples

Inference

Deep Research

​Authenticate

​Request body

​Response

​Examples

Authenticate

Request body

Response

Examples