TranscriptAI

AI/ML SaaS Application

A full-featured AI transcription platform built in partnership with Pima Web Design. TranscriptAI converts audio and video files into accurate, speaker-labeled transcripts.

Technologies Used

PythonOpenAI Whisperpyannote.audioFFmpeglibrosanoisereducePyTorchReactTypeScriptElectronVite

Application Screenshots

Drag and drop upload with security controls

Model selection and speaker detection settings

Audio waveform with speaker-labeled transcript

Usage stats and account management

Project Goals

Build a production-grade transcription product that supports multiple media formats, speaker identification, fast processing, and export-ready outputs.

My Solution

I engineered an audio processing pipeline using FFmpeg for format conversion, enhancement for cleaner signals, diarization for speaker mapping, and Whisper for high-accuracy transcription.

Key Features

OpenAI Whisper transcription (tiny to large models)

Automatic speaker diarization (up to 10 speakers)

GPU acceleration with CUDA support

Audio enhancement and noise reduction

Support for 15+ audio/video formats

Auto-detect language or manual selection

Interactive transcript viewer with audio sync

AI-powered summary generation

Keyword extraction

Color-coded speaker labels in exports

DOCX export with timestamps

Batch upload for multiple files

Secure and encrypted file handling

Usage tracking and analytics dashboard

Results & Impact

TranscriptAI achieved 98.7 percent average accuracy on clear audio, supported up to 10 speakers, and delivered usable outputs through an interface accessible to non-technical users.