Empathy Engine
Giving AI a Human Voice
Prepared for the Darwix AI Evaluation assignment
K V Jaya Harsha | Personal Website | GitHub | LinkedIn
Approach 1 — Implemented (This Demo)
Uses the Inworld TTS API with dynamic vocal parameter modulation driven by a HuggingFace emotion classifier:
- Emotion Detection —
j-hartmann/emotion-english-distilroberta-base— 7 classes: joy, surprise, anger, disgust, fear, sadness, neutral - Intensity Scaling — Model confidence (0–1) linearly scales vocal params — "I'm okay" sounds different from "THIS IS AMAZING"
- Vocal Modulation —
speakingRate(speed) +temperature(expressiveness) — both dynamically computed per emotion - Emotion Mapping — Each emotion has a defined base rate + temperature, scaled by intensity score
- TTS Engine — Inworld TTS
inworld-tts-1.5-max, voice: Clive, output:.mp3
Approach 2 — Theoretical (Research-Grade)
Architecturally superior — validated in research, not implemented here due to time and compute constraints:
- Model — F5-TTS with conditional flow matching, learns the full distribution of human speech prosody
- Emotion Transfer — Zero-shot emotion cloning from reference audio — the model hears the target emotion and replicates it
- Why it's better — Flow matching captures micro-variations in pitch, rhythm, and timbre that API parameters cannot replicate
- Papers — F5-TTS: A Fairytale for Flow-matching-based TTS (Chen et al., 2024) · Voicebox (Meta AI, 2023)
This approach remains theoretical in this submission but is well-validated in published research, and would be the production-grade choice given sufficient time and GPU resources.
Quick Samples
Emotion to Voice Mapping
| Emotion | Rate | Temp |
|---|---|---|
| Joyful | 1.25–1.50 | 1.3–1.7 |
| Surprised | 1.20–1.40 | 1.4–1.75 |
| Fearful | 1.10–1.25 | 1.2–1.45 |
| Neutral | 1.00 | 1.00 |
| Angry | 0.75–0.85 | 0.35–0.50 |
| Disgusted | 0.70–0.80 | 0.45–0.55 |
| Sad | 0.60–0.70 | 0.35–0.45 |
Rate = speed · Temp = energy
Results
Stack: Python · Gradio · HuggingFace Transformers · Inworld TTS API · httpx
Darwix AI Evaluation · 2025 · K V Jaya Harsha