Audio & Video Transcription with Whisper V3

An advanced transcription system that converts audio and video speech into accurate text using OpenAI’s Whisper Large V3 model via Hugging Face. Supports multiple media formats and uses audio processing tools for high-quality results.

Key Features:

🎤 Transcribes audio files (MP3, WAV, etc.) and video files (MP4, MOV, etc.)
🧠 Powered by Whisper Large V3 model from Hugging Face for state-of-the-art transcription
🎬 Uses FFmpeg to extract audio from video files efficiently
🔊 Integrates Librosa for audio preprocessing and signal analysis
📁 Supports long files with automatic segmentation for better performance
💬 Clean and readable text output for further use in subtitles, notes, or NLP tasks

Last updated on Jan 10, 2025