GO

closes #30 /claim #30

Summary

This PR implements the complete golem:stt@1.0.0 WIT interface across 5 major speech-to-text providers, delivering enterprise-grade transcription capabilities with unified APIs, real-time streaming, and graceful degradation for production use.

All providers implemented All 4 Environment Variables implemented ( STT_PROVIDER_ENDPOINT, STT_PROVIDER_TIMEOUT, STT_PROVIDER_MAX_RETRIES, STT_PROVIDER_LOG_LEVEL)

Provider-Specific

Google

GOOGLE_API_KEY=your-api-key GOOGLE_CLOUD_PROJECT=project-id

Azure

AZURE_SPEECH_KEY=your-speech-key AZURE_SPEECH_REGION=your-region

AWS

AWS_ACCESS_KEY_ID=your-access-key AWS_SECRET_ACCESS_KEY=your-secret-key AWS_REGION=your-region AWS_S3_BUCKET=transcription-bucket

Deepgram

DEEPGRAM_API_KEY=your-api-key

Whisper

OPENAI_API_KEY=your-openai-key

All providers tested against:

  • ✅ Basic transcription (WAV, MP3, FLAC formats)
  • ✅ Word-level timing and confidence scores
  • ✅ Speaker diarization (where supported)
  • ✅ Streaming transcription (4/5 providers)
  • ✅ Error mappings (invalid inputs, rate limits, network errors)
  • ✅ Edge cases (silence, overlapping speakers, long audio)
  • ✅ Golem durability API integration

Test Commands

Build all components

cargo make build

Test individual providers

golem worker new test:stt/worker-google –env GOOGLE_API_KEY=xxx golem worker invoke test:stt/worker-google test1 –stream

Implementation Status

Feature Google Azure AWS Deepgram Whisper Coverage
Batch Transcription 100%
Streaming ⚠️* 80%
Word Timestamps 100%
Speaker Diarization ⚠️* 100%
Custom Vocabularies ⚠️* 100%
Confidence Scores 100%
Graceful Degradation 100%

⚠️ = Gracefully degraded (returns appropriate none/fallback values)

Architecture

Component Structure

stt/ ├── stt/ # Core WIT interface types ├── google/ # Google Cloud Speech impl ├── azure/ # Microsoft Azure Speech impl ├── aws/ # Amazon Transcribe impl ├── deepgram/ # Deepgram API impl ├── whisper/ # OpenAI Whisper impl └── wit/ # WIT interface definitions

Generated Components

  • golem-stt-google.wasm (+ portable version)
  • golem-stt-azure.wasm (+ portable version)
  • golem-stt-aws.wasm (+ portable version)
  • golem-stt-deepgram.wasm (+ portable version)
  • golem-stt-whisper.wasm (+ portable version)

Error Handling & Resilience

Comprehensive error mapping:

  • invalid-audio - Malformed audio data
  • unsupported-format - Unsupported audio formats
  • unauthorized - Invalid API credentials
  • rate-limited - API quota exceeded with retry timing
  • service-unavailable - Provider downtime with retry logic
  • network-error - Connection failures with backoff

Retry Strategy:

  • Exponential backoff with jitter
  • Configurable max retries (default: 3)
  • Smart retry on transient errors only

Performance & Durability

  • Real-time streaming with sub-second latency for supported providers
  • Golem durability integration for crash recovery and state persistence
  • Memory efficient audio chunk processing
  • Concurrent request handling with connection pooling

Quick Start

Build components

cd stt && cargo make build

Deploy test application

cd test/stt && golem app deploy -b google-debug

Create worker and test

golem worker new test:stt/worker –env GOOGLE_API_KEY=xxx golem worker invoke test:stt/worker test1 –stream

Compliance

✅ Full bounty compliance:

  • Implements complete golem:stt@1.0.0 WIT interface
  • All 5 target providers implemented
  • WASI 0.23 compatible components
  • Environment variable configuration
  • Golem durability API integration
  • Comprehensive test coverage
  • Graceful feature degradation

Videos

Google

https://github.com/user-attachments/assets/15fc2dae-fdf2-496b-a852-5a29c9ca59d4

AWS

https://github.com/user-attachments/assets/7a4072a5-d839-4cfe-816b-162a0e01ea8b

Azure

https://github.com/user-attachments/assets/45baab10-6be9-489e-b422-dc50da795fb4

Deepgram

https://github.com/user-attachments/assets/932e4a20-b795-4c99-a62f-0959299b901b

Whisper

https://github.com/user-attachments/assets/5f5d5040-9565-43d4-911f-ad2d8d1f1dc4

Claim

Total prize pool $3,500
Total paid $0
Status Pending
Submitted August 10, 2025
Last updated August 10, 2025

Contributors

AD

Aditya Pratap Singh

@Aditya-PS-05

100%

Sponsors

GO

Golem Cloud

@golemcloud

$3,500