Control your computer with a command-line voice interface. Uses NVIDIA's Parakeet-TDT model for speech recognition and supports clicking UI elements, typing text, reading text aloud, interacting with a local LLM, screen capture with OCR, and more.
- Speech recognition using NVIDIA Parakeet-TDT 0.6B V2 via NeMo toolkit, providing accurate transcription with punctuation and capitalization.
- Click commands: Find and click text/buttons on screen using OCR.
- Type commands: Type text using keyboard emulation.
- Read commands: Read highlighted text aloud using text-to-speech.
- Computer commands: Interact with your system (run shell commands, manage apps/windows, query about highlighted text) using a local LLM (Ollama).
- scrap command: Select a screen area, perform OCR, and copy the extracted text.
- Stop command: Immediately halts any active text-to-speech playback.
- Rolling buffer: Captures audio just before hotkey activation to avoid missed words.
- Hotkey controls: Use keyboard shortcuts to trigger recording and interrupt actions.
Follow these steps precisely to set up the project environment.
First, install pyenv for managing Python versions. Follow the official pyenv installation instructions. After that, install the necessary system packages for both building Python and running the application using zypper:
sudo zypper install git-core gcc automake make zlib-devel libbz2-devel libopenssl-devel readline-devel sqlite3-devel xz-devel libffi-devel tk-devel xdotool espeak xclip tesseract-ocr pkill wmctrl ffmpeg gnome-screenshotThe heavy dependencies like nemo_toolkit require a specific Python version for which pre-compiled packages (wheels) are available. We will use pyenv to install Python 3.11.
# Install Python 3.11.10 (or latest 3.11.x)
pyenv install 3.11.10
# Create a dedicated virtual environment for the project
pyenv virtualenv 3.11.10 voice-command-311Now, clone the repository and use the pyenv virtual environment you just created.
# Clone the repository (if you haven't already)
git clone https://github.com/ruapotato/Voice-Command
cd Voice-Command
# Set the local python version for this directory
pyenv local voice-command-311
# Upgrade pip and install the required packages
pip install --upgrade pip
pip install -r requirements.txtThis project uses Ollama for the computer command.
- Install Ollama from ollama.com.
- Pull your desired model. For example:
ollama pull mistral- Ensure Ollama is running: Before starting the app, make sure the Ollama service is active in the background if you intend to use the computer command.
ollama serve- Navigate and Run: Open a new terminal and go to the project directory. The pyenv environment should activate automatically. Then, run the main script.
cd /path/to/Voice-Command
python main.pyNote: The first time you run it, NeMo will download the Parakeet model, which may take some time.
- Record Voice: Press and hold
Ctrl+Shift - Interrupt/Stop: Press
Ctrl+C - Exit: Type
exitorquitat the prompt, or pressCtrl+D
GPL3 by David Hamner