Native R 'torch' Implementation of 'OpenAI' 'Whisper'
Apply BPE Merges
Get Audio Duration
Convert Audio to Mel Spectrogram
Convert Byte to BPE Token
Clean Transcribed Text
Compute STFT Magnitude
Copy Weight if Exists
Create Decoder from Config
Create Encoder from Config
Create Mel Filterbank (Fallback)
Decode BPE Bytes Back to Text
Decode Timestamp Token
Download Tokenizer Files from HuggingFace
Download Model from HuggingFace
Ensure Tokenizer Files are Downloaded
Extract Segments with Timestamps
Get Initial Decoder Tokens
Get Model Cache Path
Get Path to Model Weights
Greedy Decoding
Convert Hz to Mel Scale
Check if Token is Timestamp
List Downloaded Models
List Available Models
Load Added Tokens from HuggingFace
Load and Preprocess Audio
Load Decoder Weights
Load Encoder Weights
Load Pre-computed Mel Filterbank
Load Whisper Model
Load Weights from Safetensors
Convert Mel Scale to Hz
Check if Model is Downloaded
Pad or Trim Audio to Fixed Length
Parse Device Argument
Parse Dtype Argument
Split Long Audio into Chunks
Decode Token IDs to Text
Encode Text to Token IDs
Transcribe Single Chunk
Transcribe Long Audio
Whisper Transcription
Whisper Encoder
Whisper Model Configurations
Whisper Decoder
Text Decoder
Device and Dtype Management
Get Default Dtype
Encoder Layer
Audio Encoder
Get Language Token ID
Whisper Model
Audio Preprocessing for Whisper
Special Token IDs
Whisper BPE Tokenizer
Speech-to-text transcription using a native R 'torch' implementation of 'OpenAI' 'Whisper' model <https://github.com/openai/whisper>. Supports multiple model sizes from tiny (39M parameters) to large-v3 (1.5B parameters) with integrated download from 'HuggingFace' <https://huggingface.co/> via the 'hfhub' package. Provides automatic speech recognition with optional language detection and translation to English. Audio preprocessing, mel spectrogram computation, and transformer-based encoder-decoder inference are all implemented in R using the 'torch' package.