Audio transcriptions

curl --request POST \
  --url https://app.firmware.ai/api/v1/audio/transcriptions \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "model": "<string>",
  "language": "<string>",
  "prompt": "<string>",
  "response_format": "<string>",
  "temperature": 123
}
'

{
  "text": "Hello, this is a sample transcription.",
  "task": "transcribe",
  "language": "en",
  "duration": 3.5,
  "words": [
    { "word": "Hello", "start": 0.0, "end": 0.4 },
    { "word": "this", "start": 0.5, "end": 0.7 }
  ]
}

POST

https://app.firmware.ai

api

audio

transcriptions

Audio transcriptions

curl --request POST \
  --url https://app.firmware.ai/api/v1/audio/transcriptions \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "model": "<string>",
  "language": "<string>",
  "prompt": "<string>",
  "response_format": "<string>",
  "temperature": 123
}
'

{
  "text": "Hello, this is a sample transcription.",
  "task": "transcribe",
  "language": "en",
  "duration": 3.5,
  "words": [
    { "word": "Hello", "start": 0.0, "end": 0.4 },
    { "word": "this", "start": 0.5, "end": 0.7 }
  ]
}

Transcribe audio from a file into text. Accepts multipart form data.

Authenticate

Use an API key in the Authorization header.

curl https://app.firmware.ai/api/v1/audio/transcriptions \
  -H "Authorization: Bearer $FIRMWARE_API_KEY"

Request body

model

string

required

Transcription model ID to use. See available transcription models for the full list.Examples: whisper-1, elevenlabs-scribe-v1

file

required

The audio file to transcribe. Supported formats: mp3, mp4, mpeg, mpga, m4a, wav, webm.

language

string

Language of the audio in ISO-639-1 format (e.g. en, fr, es). Providing this improves accuracy and latency.

prompt

string

Optional text to guide the model’s style or provide context. Should match the language of the audio.

response_format

string

default:"json"

Format of the transcription output. Options: json, text, srt, verbose_json, vtt.

temperature

number

default:"0"

Sampling temperature between 0 and 1. Higher values produce more varied output.

Response

text

string

The transcribed text.

task

string

Always transcribe.

language

string

Detected or specified language of the audio.

duration

number

Duration of the audio in seconds.

words

array

Word-level timestamps, if supported by the model.

Show Word object

word

string

The transcribed word.

start

number

Start time in seconds.

end

number

End time in seconds.

Examples

curl https://app.firmware.ai/api/v1/audio/transcriptions \
  -H "Authorization: Bearer $FIRMWARE_API_KEY" \
  -F file="@audio.mp3" \
  -F model="whisper-1"

{
  "text": "Hello, this is a sample transcription.",
  "task": "transcribe",
  "language": "en",
  "duration": 3.5,
  "words": [
    { "word": "Hello", "start": 0.0, "end": 0.4 },
    { "word": "this", "start": 0.5, "end": 0.7 }
  ]
}

Rerank

Audio speech

Inference

Deep Research

Audio transcriptions

Authenticate

Request body

Response

Examples

Inference

Deep Research

​Authenticate

​Request body

​Response

​Examples

Authenticate

Request body

Response

Examples