TranscriptAI Logo

TranscriptAI

AI/ML SaaS Application

A full-featured AI-powered transcription platform built in partnership with Pima Web Design. TranscriptAI converts audio and video files into accurate, speaker-labeled transcripts using OpenAI Whisper and advanced speaker diarization technology.

Technologies Used

PythonOpenAI Whisperpyannote.audioFFmpeglibrosanoisereducePyTorchReactTypeScriptElectronVite

Application Screenshots

Drag & drop upload with enterprise security

AI model selection & speaker detection

Audio waveform with speaker-labeled transcript

Usage stats & account management

Project Goals

Build an enterprise-grade transcription solution that goes beyond basic speech-to-text. The platform needed to handle diverse audio formats, automatically identify different speakers, support GPU acceleration for fast processing, and export professional documents with timestamps and speaker attribution.

My Solution

I engineered a sophisticated audio processing pipeline: FFmpeg handles format conversion to optimized 16kHz mono WAV, noisereduce applies spectral gating for audio enhancement, pyannote.audio performs speaker diarization to identify who said what, and OpenAI Whisper (with model options from tiny to large) delivers high-accuracy transcription. The system chunks long audio files for memory-efficient processing and produces color-coded DOCX documents with timestamps, speaker labels, and a processing log.

Key Features

OpenAI Whisper transcription (tiny to large models)
Automatic speaker diarization (up to 10 speakers)
GPU acceleration with CUDA support
Audio enhancement & noise reduction
Support for 15+ audio/video formats
Auto-detect language or manual selection
Interactive transcript viewer with audio sync
AI-powered summary generation
Keyword extraction
Color-coded speaker labels in exports
DOCX export with timestamps
Batch upload for multiple files
Secure & encrypted file handling
Usage tracking & analytics dashboard

Results & Impact

TranscriptAI achieves 98.7% average accuracy on clear audio with automatic speaker detection supporting up to 10 distinct speakers. The platform processes meetings, interviews, podcasts, and even challenging recordings like police calls with radio static. GPU acceleration (CUDA) enables real-time transcription, and the intuitive dashboard makes the powerful technology accessible to non-technical users.