Making a podcast without recording your own voice isn't just a "cheap substitute" anymore; it's its own genre. AI voices don't have bad days. No noise cleanup. No re-recording lines three times. You get a finished audio piece in a minute, consistent tone, consistent pace.
Quantium's audio generator now offers Google voices (including Charon, Aoede, Kore, and 28 others). I recorded an 18-minute test podcast episode, trying 12 voices. Below is a breakdown of what actually works.
Top Voices — Who Fits What
| Voice | Tone | Best For |
|---|---|---|
| Charon | Deep Male | Analytics, serious topics, documentaries |
| Aoede | Warm Female | Stories, lifestyle, interviews |
| Kore | Neutral Female | News, business, educational content |
| Puck | Playful Male | Humor, light formats |
| Fenrir | Confident Male | Tech reviews, breakdowns |
| Leda | Young Female | Ads, TikTok format |
Charon is my top pick for a finance podcast. It's low, never strained, not a "news anchor," and not a "corporate voice." Works great for 5-30 minute texts; it doesn't get tiring.
Aoede — if you're doing a narrative podcast with stories, she sounds like the narrator's truly living the text, not just reading it.
Kore is the most "neutral" of the three, and that's her strength. For educational content where you don't want to distract from the meaning, she's simply the best.
Emotions & Intonation — How to Control
Unlike older TTS, Gemini models understand stylistic cues within the prompt. You don't just write "voice this text"; you write "voice this text with a light irony, like you're telling a funny story to a friend." The model adapts its delivery.
What works (tested on dozens of texts):
- «Spoken in a calm, thoughtful manner» — meditative pace, long pauses
- «Excited, slightly faster pace» — energetic delivery, no yelling
- «Whispering, intimate tone» — for intros or emotional moments
- «Slight British accent» — yep, accents work too
- «Read like a documentary narrator» — neutral, serious delivery
Pauses & Pace
The biggest rookie mistake: text running like a machine gun, no pauses. AI voices can do pauses, but you gotta tell 'em how:
- Ellipsis mid-sentence = short pause (~0.4 sec)
- Dash = medium pause (~0.6 sec)
- Double line break = long pause (~1.2 sec) — for topic transitions
- Commas act as mini-pauses; don't overuse them
You set the pace in the style cue: "slow, contemplative pace," "brisk and energetic," "conversational tempo."
Workflow: From Script to Finished Episode
Here's how I do it (18-minute episode — 35 minutes total):
- Step 1. I write the script in a chat with ChatGPT 5.4, which remembers my tone.
- Step 2. I break the script into 2-3 minute chunks. Voice models work better with shorter segments — less intonation drift.
- Step 3. I generate each chunk separately, using the same style cue.
- Step 4. I download the files and stitch them together in an editor (Audacity, any free one works).
- Step 5. I add music and background ambiance — that's a separate step in the audio editor.
Get more details on the basic mechanics in the TTS tutorial.
Quantium Price Per Minute
Quantium TTS charges per character. Roughly, 1 credit = ~250 characters = ~20 seconds of speech. An 18-minute episode (about 15,000 characters) = ~60 credits. That's about 2% of the monthly Basic plan.
For comparison: commercial TTS services (ElevenLabs, Resemble) cost $5-15 per hour of speech. With Quantium, you get the same volume for significantly less, as it's part of a bundled plan with chat, images, and video.
Related resources: TTS tutorial, audio features, work gallery, marketer case study.
Try Quantium Free
20 credits monthly on the free plan. 30+ AI models in one Telegram bot.
Open Bot →


