Generate audio from a text input. Returns an audio file stream.
Authenticate
Use an API key in the Authorization header.
curl https://app.firmware.ai/api/v1/audio/speech \
-H " Authorization: Bearer $FIRMWARE_API_KEY "
Request body
TTS model ID to use. See available speech models for the full list. Examples: gpt-4o-mini-tts, elevenlabs-tts-multilingual-v2
The text to synthesize. Maximum length is 4096 characters.
Voice to use for synthesis. OpenAI voices: alloy, echo, fable, onyx, nova, shimmer. ElevenLabs voices map automatically from these names or accept raw voice IDs.
Audio format for the output. Options: mp3, opus, aac, flac, pcm, wav.
Playback speed of the generated audio. Range: 0.25 to 4.0.
Response
Returns the audio file as a binary stream with Content-Type: audio/mpeg (or the appropriate MIME type for the requested format).
Examples
curl
ElevenLabs
Python
TypeScript
curl https://app.firmware.ai/api/v1/audio/speech \
-H " Authorization: Bearer $FIRMWARE_API_KEY " \
-H " Content-Type: application/json " \
-d ' {
"model": "tts-1",
"input": "Hello, welcome to Firmware.",
"voice": "alloy"
} ' \
--output speech.mp3