Speech-to-Text
AI technology that converts spoken audio into written text (also called automatic speech recognition or ASR). Modern systems handle accents, background noise, and multiple speakers.
Why It Matters
Speech-to-text enables voice commands, meeting transcription, voice search, and accessibility features. It is a core component of voice-first interfaces.
Example
Whisper (by OpenAI) transcribing a one-hour meeting recording into text with speaker labels, punctuation, and 95%+ accuracy, even with background noise.
Think of it like...
Like having a perfect stenographer who can transcribe any conversation in real time, no matter the accent or audio quality.
Related Terms
Text-to-Speech
AI technology that converts written text into natural-sounding human speech. Modern TTS systems can generate voices with realistic intonation, emotion, and even clone specific voices.
Natural Language Processing
The branch of AI that deals with the interaction between computers and human language. NLP enables machines to read, understand, generate, and make sense of human language in a useful way.
Whisper
OpenAI's open-source automatic speech recognition model that can transcribe and translate speech in multiple languages with high accuracy.