Transcribe audio from a file into text. Accepts multipart form data.
Authenticate
Use an API key in the Authorization header.
curl https://app.firmware.ai/api/v1/audio/transcriptions \
-H "Authorization: Bearer $FIRMWARE_API_KEY"
Request body
The audio file to transcribe. Supported formats: mp3, mp4, mpeg, mpga, m4a, wav, webm.
Language of the audio in ISO-639-1 format (e.g. en, fr, es). Providing this improves accuracy and latency.
Optional text to guide the model’s style or provide context. Should match the language of the audio.
Format of the transcription output. Options: json, text, srt, verbose_json, vtt.
Sampling temperature between 0 and 1. Higher values produce more varied output.
Response
Detected or specified language of the audio.
Duration of the audio in seconds.
Word-level timestamps, if supported by the model.
Examples
curl https://app.firmware.ai/api/v1/audio/transcriptions \
-H "Authorization: Bearer $FIRMWARE_API_KEY" \
-F file="@audio.mp3" \
-F model="whisper-1"
{
"text": "Hello, this is a sample transcription.",
"task": "transcribe",
"language": "en",
"duration": 3.5,
"words": [
{ "word": "Hello", "start": 0.0, "end": 0.4 },
{ "word": "this", "start": 0.5, "end": 0.7 }
]
}