「Azure Text to speech」の使い方

以前、Azureの「Speech To Text」を使って音声のテキスト化を行った。
今回はその逆。

テキストを入力して音声に変換する。

前回↓

Pythonを用いたAzure APIによる音声認識の方法

Azure「Speech To Text」の使い方音声認識技術によって音声ファイルをテキスト化したいという声がある。音声認識技術によって音声ファイルをテキスト…

今回はMicrosoft Azure「Text To Speech」APIの使い方を記す。

1. Azure API 使用準備
2. プログラム作成
3. プログラム作成(音声ファイルに出力)
4. Azure音声認識リンク
5. 参考

1. Azure API 使用準備

はじめに以下の4つを行う必要がある。

Azureアカウントの作成
音声認識APIのインスタンス作成
APIキーなど取得
Python環境構築

これらの内容は前回の音声認識のときと同じなので、
そちらの1.～4.を見たら良い。

前回↓

Pythonを用いたAzure APIによる音声認識の方法

Azure「Speech To Text」の使い方音声認識技術によって音声ファイルをテキスト化したいという声がある。音声認識技術によって音声ファイルをテキスト…

2. プログラム作成

PythonによってAzure APIを呼び出す。
事前に用意した「KEY」、「REGION」を使用。

KEY = “Azureから取得できるAPIキー”
REGION = “インスタンスのリージョン”

日本語の声は「ja-JP-NanamiNeural」と「ja-JP-KeitaNeural」の2つから選択できる。

import azure.cognitiveservices.speech as speechsdk
from azure.cognitiveservices.speech.audio import AudioOutputConfig

#### Azure から取得できるAPIキー、インスタンスのリージョン、言語、
KEY = "Azureから取得できるAPIキー"
REGION = "インスタンスのリージョン"
LANGAGE = "ja-JP"
VOICE = ["ja-JP-NanamiNeural", "ja-JP-KeitaNeural"][0] # 0:Nanami:女性,　1:Keita:男性

#### 初期設定
speech_config = speechsdk.SpeechConfig(subscription=KEY, region=REGION)
speech_config.speech_synthesis_language = LANGAGE 
speech_config.speech_synthesis_voice_name = VOICE
audio_config = AudioOutputConfig(use_default_speaker=True)
synthesizer = speechsdk.SpeechSynthesizer(speech_config=speech_config, audio_config=audio_config)

#### 音声出力
synthesizer.speak_text_async("ここに入力したテキストを話します")

3. プログラム作成(音声ファイルに出力)

その場で喋るだけでなく。
音声ファイルに出力することもできる。

audio_config = ...の部分を以下のように変更したらいい。

audio_config = AudioOutputConfig(use_default_speaker=True)

↓

audio_config = speechsdk.audio.AudioOutputConfig(filename="speech.wav")

↓はコード全部

import azure.cognitiveservices.speech as speechsdk
from azure.cognitiveservices.speech.audio import AudioOutputConfig

#### Azure から取得できるAPIキー、インスタンスのリージョン、言語、
KEY = "Azureから取得できるAPIキー"
REGION = "インスタンスのリージョン"
LANGAGE = "ja-JP"
VOICE = ["ja-JP-NanamiNeural", "ja-JP-KeitaNeural"][0] # 0:Nanami:女性,　1:Keita:男性

#### 初期設定
speech_config = speechsdk.SpeechConfig(subscription=KEY, region=REGION)
speech_config.speech_synthesis_language = LANGAGE 
speech_config.speech_synthesis_voice_name = VOICE
audio_config = speechsdk.audio.AudioOutputConfig(filename="speech.wav")
synthesizer = speechsdk.SpeechSynthesizer(speech_config=speech_config, audio_config=audio_config)

#### 音声出力
synthesizer.speak_text_async("ここに入力したテキストを話します")