Build Your Own Desktop Voice Assistant in Python

This article was published as a part of the Data Science Blogathon.

Introduction

How cool is it to build your own personal assistants like Alexa or Siri? It’s not very complicated and can be easily achieved in Python. Personal digital assistants are capturing a lot of attention lately. Chatbots are common in most commercial websites. With growing advancements in artificial intelligence, training the machines to tackle day-to-day tasks is the norm.

Voice based personal assistants have gained a lot of popularity in this era of smart homes and smart devices. These personal assistants can be easily configured to perform many of your regular tasks by simply giving voice commands. Google has popularized voice-based search that is a boon for many like senior citizens who are not comfortable using the keypad/keyboard.

This article will walk you through the steps to quickly develop a voice based desktop assistant, Minchu (meaning Flash) that you can deploy on any device. The prerequisite for developing this application is knowledge of Python.

For building any voice based assistant you need two main functions. One for listening to your commands and another to respond to your commands. Along with these two core functions, you need the customized instructions that you will feed your assistant.

The first step is to install and import all the necessary libraries. Use pip install to install the libraries before importing them. Following are some of the key libraries used in this program:

  • The SpeechRecognition library allows Python to access audio from your system’s microphone, transcribe the audio, and save it.
  • Google’s text-to-speech package, gTTS converts your audio questions to text. The response from the look-up function that you write for fetching answer to the question is converted to an audio phrase by gTTS. This package interfaces with Google Translate’s API.
  • Playsound package is used to give voice to the answer. Playsound allows Python to play MP3 files.
  • Web browser package provides a high-level interface that allows displaying Web-based pages to users. Selenium is another option for displaying web pages. However, for using this you need to install and provide the browser-specific web driver.
  • Wikipedia is used to fetch a variety of information from the Wikipedia website.
  • Wolfram|Alpha is a computational knowledge engine or answer engine that can compute mathematical questions using Wolfram’s knowledge base and AI technology. You need to fetch the API to use this package.

Implementation of the Personal Assistant

The entire code for this application is written in Python using libraries supported by Python.

Import required libraries:

import speech_recognition as sr #convert speech to text
import datetime #for fetching date and time
import wikipedia
import webbrowser
import requests
import playsound # to play saved mp3 file 
from gtts import gTTS # google text to speech 
import os # to save/open files 
import wolframalpha # to calculate strings into formula
from selenium import webdriver # to control browser operations

Write a function to capture your requests/questions:

def talk():
    input=sr.Recognizer()
    with sr.Microphone() as source:
        audio=input.listen(source)
        data=""
        try:
            data=input.recognize_google(audio)
            print("Your question is, " + data)
            
        except sr.UnknownValueError:
            print("Sorry I did not hear your question, Please repeat again.")
return data

Next, write a function to respond to your questions:

def respond(output):
    num=0
    print(output)
    num += 1
    response=gTTS(text=output, lang='en')
    file = str(num)+".mp3"
    response.save(file)
    playsound.playsound(file, True)
    os.remove(file)

Now write the module to add all the required customized responses to your questions:

if __name__=='__main__':
    respond("Hi, I am Minchu your personal desktop assistant")
          
    while(1):
        respond("How can I help you?")
        text=talk().lower()
        
        if text==0:
            continue
            
        if "stop" in str(text) or "exit" in str(text) or "bye" in str(text):
            respond("Ok bye and take care")
            break
            
        if 'wikipedia' in text:
            respond('Searching Wikipedia')
            text =text.replace("wikipedia", "")
            results = wikipedia.summary(text, sentences=3)
            respond("According to Wikipedia")
            print(results)
            respond(results)
                  
        elif 'time' in text:
            strTime=datetime.datetime.now().strftime("%H:%M:%S")
            respond(f"the time is {strTime}")     
        
        elif 'search'  in text:
            text = text.replace("search", "")
            webbrowser.open_new_tab(text)
            time.sleep(5)
        
        elif "calculate" or "what is" in text: 
            question=talk()
            app_id="Mention your API Key"
            client = wolframalpha.Client(app_id)
            res = client.query(question)
            answer = next(res.results).text
            respond("The answer is " + answer)
            
        elif 'open googlr' in text:
            webbrowser.open_new_tab("https://www.google.com")
            respond("Google is open")
            time.sleep(5)
            
        elif 'youtube' in text: 
            driver = webdriver.Chrome(r"Mention your webdriver location") 
            driver.implicitly_wait(1) 
            driver.maximize_window()
            respond("Opening in youtube") 
            indx = text.split().index('youtube') 
            query = text.split()[indx + 1:] 
            driver.get("http://www.youtube.com/results?search_query =" + '+'.join(query))              
                
        elif "open word" in text: 
            respond("Opening Microsoft Word") 
            os.startfile('Mention location of Word in your system') 
        
        else:
           respond("Application not available")

Once all the modules of your program are ready, execute it. You will be thrilled to hear your own personal assistant converse with you. You can add more customizations based on your requirements, and develop a very intuitive voice based assistant. Once your desktop assistant is ready it’s time to deploy it. You can convert it into an executable file and run it on any device.

Generate an executable for your voice assistant

To create an executable from the Python script you can use Pyinstaller. First, you have to convert the .ipynb Python file to a .py extension. For this use ipython and nbconvert packages. Next, use Pyinstaller to create a .exe file for your .py file. All the following steps need to be performed in the command prompt from the location where Python is installed:

pip install ipython
pip install nbconvert
pip install pyinstaller
ipython nbconvert --to script minchu.ipynb #mention .ipynb file name to convert to .py
pyinstaller minchu.py #builds .exe file

The .py file created should be located in the same folder where the .ipynb file is located. Once the build is complete, Pyinstaller creates two folders, build and dist. Navigate to the dist folder and execute the .exe file to run your personal desktop assistant. This application is portable and can be executed on any device.

Conclusion

This is how simple it is to build your own voice assistant. You can add many more features such as play your favorite songs, give weather details, open email application, compose emails, restart your system, etc. You can integrate this application into your phone or tablet as well. Have fun exploring and developing your own Alexa/Siri/Cortana.

The entire code along with some additional features for this voice assistant is located in my git repo. You can checkout Geeks for Geeks for more variations in Python-based personal assistants.

Author: admin

Leave a Reply

Your email address will not be published.