Speech to Text

Enter your monthly volume below for each attribute and click "Add to Calculator" to estimate your monthly project costs.



First thousand minutes are free

$0.02 USD / MINUTE ---

First thousand minutes are free
(Telephony Model Add-on)

$0.02 USD / MINUTE (Telephony Add-on) ---
    BACK TO COGNITIVE APIs $00.00/month
More Info on Pricing


Watson Speech to Text bridges the gap between the human voice and the written word with easy-to-use, accurate transcriptions. The service combines grammar, language structure, and audio signal to convert voice into text anytime and from anywhere. Using IBM’s speech recognition capabilities in multiple languages, the transcription can be streamed back to the client in near real time, corrected for meaning and will include the detection of one or more keywords pulled from the stream. Available via a WebSocket connection or REST API.

Intended Use

The Speech to Text service is useful anywhere voice-interactivity is required. Best for mobile experiences, transcribing media files, call center transcriptions, voice control, or converting sound-to-text for more searchable data.

Supported languages include:

  • US English
  • UK English
  • Japanese
  • Spanish
  • Brazilian
  • Portuguese
  • Modern Standard Arabic
  • Mandarin

Your Input

  • Streamed audio with Intelligible Speech
  • Recorded audio with Intelligible Speech

Service output

  • Text transcriptions of the audio with recognized words

Looking to Understand More About the Speech To Text API?

Contact Us to Discuss Your Project