Introduction to Azure Text-to-Speech API


What is Azure Text-to-Speech API?

Azure Text-to-Speech API is a cloud-based service provided by Microsoft Azure that allows developers to convert text into spoken words. This API enables you to integrate natural-sounding voice synthesis into your applications, making it useful for various scenarios, such as accessibility features, voice assistants, and more.


Getting Started

To use the Azure Text-to-Speech API, you'll need an Azure account and an API key. Here are the basic steps to get started:

  1. Sign in to your Azure Portal.
  2. Create a new Azure Text-to-Speech resource.
  3. Retrieve your API key and endpoint.

Sample Code

Here's a simple example of how to use the Azure Text-to-Speech API in Python:

import os
import requests
import json
subscription_key = 'YOUR_SUBSCRIPTION_KEY'
endpoint = 'YOUR_ENDPOINT'
text_to_speak = 'Hello, this is a sample text to be synthesized.'
headers = {
'Content-Type': 'application/ssml+xml',
'X-Microsoft-OutputFormat': 'audio-16khz-128kbitrate-mono-mp3',
'Authorization': 'Bearer ' + subscription_key,
}
data = f'<speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xml:lang="en-US"><voice name="en-US-Guy24kRUS">{text_to_speak}</voice></speak>'
response = requests.post(endpoint, headers=headers, data=data)
if response.status_code == 200:
with open('output.mp3', 'wb') as audio_file:
audio_file.write(response.content)
print('Audio file created.')
else:
print('Error:', response.status_code, response.text)

Conclusion

The Azure Text-to-Speech API offers powerful voice synthesis capabilities that can enhance your applications and services. With a few simple steps, you can integrate natural-sounding speech into your projects.