Skip to main content
POST
https://app.firmware.ai
/
api
/
v1
/
audio
/
speech
Audio speech
curl --request POST \
  --url https://app.firmware.ai/api/v1/audio/speech \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "model": "<string>",
  "input": "<string>",
  "voice": "<string>",
  "response_format": "<string>",
  "speed": 123
}
'
Generate audio from a text input. Returns an audio file stream.

Authenticate

Use an API key in the Authorization header.
curl https://app.firmware.ai/api/v1/audio/speech \
  -H "Authorization: Bearer $FIRMWARE_API_KEY"

Request body

model
string
required
TTS model ID to use. See available speech models for the full list.Examples: gpt-4o-mini-tts, elevenlabs-tts-multilingual-v2
input
string
required
The text to synthesize. Maximum length is 4096 characters.
voice
string
required
Voice to use for synthesis. OpenAI voices: alloy, echo, fable, onyx, nova, shimmer. ElevenLabs voices map automatically from these names or accept raw voice IDs.
response_format
string
default:"mp3"
Audio format for the output. Options: mp3, opus, aac, flac, pcm, wav.
speed
number
default:"1.0"
Playback speed of the generated audio. Range: 0.25 to 4.0.

Response

Returns the audio file as a binary stream with Content-Type: audio/mpeg (or the appropriate MIME type for the requested format).

Examples

curl https://app.firmware.ai/api/v1/audio/speech \
  -H "Authorization: Bearer $FIRMWARE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "tts-1",
    "input": "Hello, welcome to Firmware.",
    "voice": "alloy"
  }' \
  --output speech.mp3