Skip to main content
POST
https://app.firmware.ai
/
api
/
v1
/
audio
/
transcriptions
Audio transcriptions
curl --request POST \
  --url https://app.firmware.ai/api/v1/audio/transcriptions \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "model": "<string>",
  "language": "<string>",
  "prompt": "<string>",
  "response_format": "<string>",
  "temperature": 123
}
'
{
  "text": "Hello, this is a sample transcription.",
  "task": "transcribe",
  "language": "en",
  "duration": 3.5,
  "words": [
    { "word": "Hello", "start": 0.0, "end": 0.4 },
    { "word": "this", "start": 0.5, "end": 0.7 }
  ]
}
Transcribe audio from a file into text. Accepts multipart form data.

Authenticate

Use an API key in the Authorization header.
curl https://app.firmware.ai/api/v1/audio/transcriptions \
  -H "Authorization: Bearer $FIRMWARE_API_KEY"

Request body

model
string
required
Transcription model ID to use. See available transcription models for the full list.Examples: whisper-1, elevenlabs-scribe-v1
file
file
required
The audio file to transcribe. Supported formats: mp3, mp4, mpeg, mpga, m4a, wav, webm.
language
string
Language of the audio in ISO-639-1 format (e.g. en, fr, es). Providing this improves accuracy and latency.
prompt
string
Optional text to guide the model’s style or provide context. Should match the language of the audio.
response_format
string
default:"json"
Format of the transcription output. Options: json, text, srt, verbose_json, vtt.
temperature
number
default:"0"
Sampling temperature between 0 and 1. Higher values produce more varied output.

Response

text
string
The transcribed text.
task
string
Always transcribe.
language
string
Detected or specified language of the audio.
duration
number
Duration of the audio in seconds.
words
array
Word-level timestamps, if supported by the model.

Examples

curl https://app.firmware.ai/api/v1/audio/transcriptions \
  -H "Authorization: Bearer $FIRMWARE_API_KEY" \
  -F file="@audio.mp3" \
  -F model="whisper-1"
{
  "text": "Hello, this is a sample transcription.",
  "task": "transcribe",
  "language": "en",
  "duration": 3.5,
  "words": [
    { "word": "Hello", "start": 0.0, "end": 0.4 },
    { "word": "this", "start": 0.5, "end": 0.7 }
  ]
}