Python 识别录音并转为文字的实现

时间：2023-12-15

Python 识别录音并转为文字的实现攻略

简介

在音频领域中，我们可能需要将录音转换为文本，从而方便文本的处理和分析。本攻略将介绍如何使用 Python 将录音文件转换为文本，提供两个示例：

使用 Google Cloud Speech-to-Text API 实现语音转文本；
使用 SpeechRecognition 库实现语音转文本。

Google Cloud Speech-to-Text API 示例

要使用 Google Cloud Speech-to-Text API 实现语音转文本，我们需要进行以下步骤：

步骤一：创建 Google Cloud 服务账号

在 Google Cloud 控制台中创建一个服务账号，以便在代码中使用该账号进行身份验证。具体步骤如下：

登录 Google Cloud 控制台；
在左侧的导航栏中点击“IAM & Admin”；
在页面中点击“Service Accounts”；
点击“Create Service Account”；
填写名称、ID 和描述等信息，然后点击“Create”；
接下来你需要为该服务账号分配所需的角色，以便进行操作。例如，你可以将该账号分配为“Speech-to-Text Admin”或“Project Editor”等角色。

步骤二：生成私有密钥

在生成私有密钥之前，你需要确认已经创建了一个 Google Cloud 服务账号。接下来你需要生成一个私有密钥，以便在代码中使用该密钥进行身份验证。

在 Google Cloud 控制台左侧的导航栏中点击“IAM & Admin”;
在页面中点击“Service Accounts”；
找到你创建的服务账号，然后在操作栏中点击“Edit”；
在页面下方找到“Keys”选项卡，然后点击“Add Key”；
选择“JSON”选项，然后点击“Create”；
下载生成的私有密钥文件并妥善保管该文件。

步骤三：安装 Python 模块

首先，你需要安装 Python 的 Google Cloud 认证及语音转换模块：

pip install google-auth google-auth-oauthlib google-auth-httplib2 google-cloud-speech

步骤四：编写代码

在下载完毕私有密钥文件并安装了 Python 的 Google Cloud 认证及语音转换模块后，我们就可以开始编写代码进行语音转文本功能的实现了。以下是一个简单的示例：

from google.cloud import speech_v1p1beta1
from google.cloud.speech_v1p1beta1 import enums
from google.cloud.speech_v1p1beta1 import types
import io

credential_path = "path/to/your/credential.json"
client = speech_v1p1beta1.SpeechClient.from_service_account_json(credential_path)

file_name = "path/to/your/audio_file.wav"

with io.open(file_name, 'rb') as audio_file:
  content = audio_file.read()
  audio = types.RecognitionAudio(content=content)

  config = types.RecognitionConfig(
      encoding=enums.RecognitionConfig.AudioEncoding.LINEAR16,
      language_code='en-US')

  response = client.recognize(config, audio)

for result in response.results:
    print('Transcript: {}'.format(result.alternatives[0].transcript))
    print('Confidence: {}'.format(result.alternatives[0].confidence))

在运行该示例代码时，你需要传入在步骤一 & 二中生成的私有密钥，并将你的语音文件路径修改为“path/to/your/audio_file.wav”。运行该示例后将输出语音转换后的文本。

SpeechRecognition 库示例

另外一个可供选择的 Python 库是 SpeechRecognition。SpeechRecognition 是一个库，主要用于将音频文件转换为文本，支持多种语音识别引擎，包括 Google Cloud Speech API、Microsoft Bing Voice Recognition、IBM Speech to Text 等。下面我们演示如何使用 SpeechRecognition 库进行语音转文本：

步骤一：安装 Python 模块

pip install SpeechRecognition

步骤二：编写代码

import speech_recognition as sr

file_name = "path/to/your/audio_file.wav"

r = sr.Recognizer()

with sr.AudioFile(file_name) as source:
    audio_text = r.listen(source)
    try:
        text = r.recognize_google(audio_text)
        print('Transcript: {}'.format(text))
    except:
        print('Sorry! Unable to recognize speech')

在运行该示例代码时，你需要将你的语音文件路径修改为“path/to/your/audio_file.wav”。运行该示例后将输出语音转换后的文本。

结语

以上两个示例展示了在 Python 中如何将录音文件转换为文本。在使用 Google Cloud Speech-to-Text API 进行语音转文本时，需要进行几个额外的步骤，例如创建服务账号和生成私有密钥。而 SpeechRecognition 则是一个包含多种语音识别引擎的 Python 库，无需配置额外的服务账号和密钥，只需要按照库的说明进行安装和使用即可。

上一篇：python支持断点续传的多线程下载示例 下一篇：浅析Python中线程以及线程阻塞