Skip to content

A modern, real-time speech recognition application built with OpenAI's Whisper and PySide6. This application provides a beautiful, native-looking interface for transcribing audio in real-time with support for multiple languages.

License

Notifications You must be signed in to change notification settings

phongthanhbuiit/whisper-realtime-gui

Repository files navigation

Whisper Realtime Transcription GUI

A modern, real-time speech recognition application built with OpenAI's Whisper and PySide6. This application provides a beautiful, native-looking interface for transcribing audio in real-time with support for multiple languages.

Demo Screenshot

Features

  • 🎙 Real-time audio transcription using OpenAI's Whisper
  • 🌈 Beautiful, modern UI with animated audio visualizer
  • 🚀 GPU acceleration support (Apple Silicon/CUDA)
  • 🌍 Multi-language support (English, French, Vietnamese)
  • 📊 Live audio waveform visualization with dynamic effects
  • 💫 Smooth animations and transitions
  • 🎯 Multiple Whisper model options (tiny, base, small, medium, large)
  • ⚡️ Optimized streaming for better real-time performance
  • 🎨 Enhanced visual feedback with glowing effects

Components

1. Real-time Transcription (whisper_gui.py)

  • Modern GUI application for real-time speech recognition
  • Dynamic waveform visualization with:
    • Smooth wave transitions
    • Responsive amplitude changes
    • Glowing effects during recording
  • Support for multiple languages and models
  • GPU acceleration for better performance
  • Optimized audio streaming with 0.3s chunks
  • Automatic model initialization

2. File Transcription (file-to-text.py)

  • Convert audio/video files to text
  • Supports multiple file formats:
    • Audio: mp3, wav, m4a, etc.
    • Video: mp4, mkv, avi, etc.
  • Batch processing capability
  • Output formats:
    • Plain text (.txt)
    • Microsoft Word (.docx)
    • Timestamps support

Requirements

  • Python 3.11+
  • macOS (tested on Apple Silicon)
  • GPU recommended for better performance

Installation

For macOS Users

  1. Download the latest .dmg file from the Releases page
  2. Open the downloaded .dmg file
  3. Drag the application to your Applications folder
  4. Double click to run the application

For Developers

  1. Clone the repository:
git clone https://github.com/phongthanhbuiit/whisper-realtime-gui.git
cd whisper-realtime-gui
  1. Create and activate a virtual environment:
python -m venv venv
source venv/bin/activate  # On macOS/Linux
  1. Install the required packages:
pip install -r requirements.txt

Release Information

  • Version: 1.0.0
  • Release Date: 2023-02-20
  • Changes:
    • Initial release with real-time transcription and file transcription features

Usage

Real-time Transcription

  1. Activate the virtual environment if not already activated:
source venv/bin/activate  # On macOS/Linux
  1. Run the GUI application:
python whisper_gui.py
  1. Select your preferred model and language from the dropdown menus
  2. Click "Start Recording" to begin transcription
  3. Speak into your microphone
  4. Watch the beautiful waveform animation and real-time transcription

Model Selection

Choose from different Whisper models based on your needs:

  • tiny: Fastest, lowest accuracy (good for testing)
  • base: Good balance of speed and accuracy
  • small: Better accuracy, still reasonable speed
  • medium: High accuracy, slower processing
  • large: Best accuracy, requires more resources

Language Support

Currently supports:

  • English
  • Vietnamese
  • French

The language can be changed in real-time during transcription.

Performance Tips

  1. Model Selection:

    • Start with tiny or base model for testing
    • Use small for general use
    • Use medium or large only if you need highest accuracy
  2. GPU Acceleration:

    • The app automatically uses GPU if available
    • Recommended for medium and large models
  3. Audio Input:

    • Speak clearly and at a moderate pace
    • Keep microphone at a consistent distance
    • Avoid background noise for better accuracy

Troubleshooting

If you encounter issues:

  1. Audio Not Detected:

    • Check your microphone permissions
    • Verify input device in system settings
  2. Slow Performance:

    • Try a smaller model
    • Ensure GPU acceleration is working
    • Check CPU/Memory usage
  3. Transcription Issues:

    • Try changing the language setting
    • Speak more clearly
    • Adjust your microphone position

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

Author

Support

If you found this project helpful, please give it a ⭐️!

About

A modern, real-time speech recognition application built with OpenAI's Whisper and PySide6. This application provides a beautiful, native-looking interface for transcribing audio in real-time with support for multiple languages.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages