askill
skill-elevenlabs-tts-tool

skill-elevenlabs-tts-toolSafety 95Repository

ElevenLabs text-to-speech CLI tool guide

0 stars
1.2k downloads
Updated 2/5/2026

Package Files

Loading files...
SKILL.md

When to use

  • Converting text to speech with ElevenLabs API
  • Exploring available voices and models
  • Managing TTS subscriptions and usage
  • Integrating TTS into workflows and pipelines

ElevenLabs TTS Tool Skill

Purpose

Comprehensive guide for the elevenlabs-tts-tool CLI - a professional command-line interface for ElevenLabs text-to-speech synthesis. Provides both direct audio playback and file output with support for 42+ premium voices and multiple models.

When to Use This Skill

Use this skill when:

  • Converting text to speech for notifications, audiobooks, or content creation
  • Exploring and comparing different voice characteristics
  • Managing ElevenLabs subscription quotas and usage
  • Building voice-enabled workflows and automation
  • Integrating TTS into Claude Code hooks or other tools

Do NOT use this skill for:

  • Direct ElevenLabs API programming (use SDK docs instead)
  • Custom voice cloning (requires ElevenLabs web interface)
  • Real-time streaming TTS (tool focuses on file/playback generation)

CLI Tool: elevenlabs-tts-tool

Professional text-to-speech CLI tool built with Python 3.13+, uv, and the ElevenLabs SDK.

Installation

# Clone repository
git clone https://github.com/dnvriend/elevenlabs-tts-tool.git
cd elevenlabs-tts-tool

# Install globally with uv
uv tool install .

# Verify installation
elevenlabs-tts-tool --version

Prerequisites

Quick Start

# Set API key
export ELEVENLABS_API_KEY='your-api-key'

# Basic text-to-speech
elevenlabs-tts-tool synthesize "Hello world"

# Use different voice
elevenlabs-tts-tool synthesize "Hello" --voice adam

# Save to file
elevenlabs-tts-tool synthesize "Text" --output speech.mp3

Progressive Disclosure

synthesize - Convert Text to Speech

Convert text to speech using ElevenLabs API. Supports direct playback or file output.

Usage:

elevenlabs-tts-tool synthesize [TEXT] [OPTIONS]

Arguments:

  • TEXT: Text to synthesize (optional if --stdin used)
  • --stdin, -s: Read text from stdin instead of argument
  • --voice, -v NAME: Voice name or ID (default: rachel)
  • --model, -m ID: Model ID (default: eleven_turbo_v2_5)
  • --output, -o PATH: Save to audio file instead of playing
  • --format, -f FORMAT: Output format (default: mp3_44100_128)

Examples:

# Basic usage - play through speakers
elevenlabs-tts-tool synthesize "Hello world"

# Use different voice
elevenlabs-tts-tool synthesize "Hello" --voice adam

# Use specific model
elevenlabs-tts-tool synthesize "Hello" --model eleven_multilingual_v2

# Emotional expression (requires eleven_v3 model)
elevenlabs-tts-tool synthesize "[happy] Welcome to our service!" --model eleven_v3

# Multiple emotions
elevenlabs-tts-tool synthesize "[excited] Great news! [cheerfully] Your project is approved!" --model eleven_v3

# Add pauses with SSML
elevenlabs-tts-tool synthesize "Point one <break time=\"0.5s\" /> Point two <break time=\"0.5s\" /> Point three."

# Read from stdin
echo "Text from pipeline" | elevenlabs-tts-tool synthesize --stdin

# Save to file
elevenlabs-tts-tool synthesize "Text" --output speech.mp3

# Pipeline integration
cat document.txt | elevenlabs-tts-tool synthesize --stdin --output audiobook.mp3

Output: Plays audio through default speakers or saves to specified file format.

Available Formats:

  • mp3_44100_128 (default): MP3, 44.1kHz, 128kbps
  • mp3_44100_64: MP3, 44.1kHz, 64kbps
  • mp3_22050_32: MP3, 22.05kHz, 32kbps
  • pcm_44100: PCM WAV, 44.1kHz (requires Pro tier)

list-voices - Show Available Voices

List all available ElevenLabs voices with characteristics.

Usage:

elevenlabs-tts-tool list-voices

Examples:

# List all voices
elevenlabs-tts-tool list-voices

# Filter by gender
elevenlabs-tts-tool list-voices | grep female
elevenlabs-tts-tool list-voices | grep male

# Filter by accent
elevenlabs-tts-tool list-voices | grep British
elevenlabs-tts-tool list-voices | grep American

# Filter by age
elevenlabs-tts-tool list-voices | grep young
elevenlabs-tts-tool list-voices | grep middle_aged

# Combine filters
elevenlabs-tts-tool list-voices | grep "female.*young.*British"

Output:

Voice           Gender     Age          Accent          Description
====================================================================================================
rachel          female     young        American        Calm and friendly American voice...
adam            male       middle_aged  American        Deep, authoritative American male...
charlotte       female     middle_aged  British         Smooth, professional British voice...
...
====================================================================================================
Total: 42 voices available

Popular Voices:

  • rachel: Calm, friendly American female (default)
  • adam: Deep, authoritative American male
  • charlotte: Professional British female
  • josh: Young, casual American male
  • bella: Expressive Italian female

list-models - Show TTS Models

List all available ElevenLabs TTS models with characteristics and use cases.

Usage:

elevenlabs-tts-tool list-models

Examples:

# List all models
elevenlabs-tts-tool list-models

# Filter by status
elevenlabs-tts-tool list-models | grep stable
elevenlabs-tts-tool list-models | grep deprecated

# Find low-latency models
elevenlabs-tts-tool list-models | grep -i "ultra-low"

# Find multilingual models
elevenlabs-tts-tool list-models | grep -i "multilingual"

Output: Comprehensive model information including:

  • Model ID and version
  • Quality and latency characteristics
  • Language support (mono vs multilingual)
  • Character limits
  • Best use cases
  • Special features (emotions, etc.)

Key Models:

  • eleven_turbo_v2_5: Fast, high-quality (default, best value)
  • eleven_flash_v2_5: Ultra-low latency (real-time applications)
  • eleven_multilingual_v2: 29 languages, production quality
  • eleven_v3: Most expressive with emotion tags (alpha, 2x cost)

Cost Multipliers:

  • Turbo/Flash models: 1x cost
  • Multilingual v2: 1x cost
  • v3 models: 2x cost (half the minutes/tokens)

info - Show Subscription Info

Display subscription tier, character usage, quota limits, and historical usage.

Usage:

elevenlabs-tts-tool info [--days N]

Arguments:

  • --days, -d N: Number of days of historical usage to display (default: 7)

Examples:

# View subscription with last 7 days of usage
elevenlabs-tts-tool info

# View last 30 days of usage
elevenlabs-tts-tool info --days 30

# Quick quota check (1 day)
elevenlabs-tts-tool info --days 1

# Check usage before long generation
elevenlabs-tts-tool info --days 1 && elevenlabs-tts-tool synthesize "Long text..."

Output Information:

  • Subscription tier and status
  • Character usage (used/limit/remaining)
  • Quota reset date
  • Historical usage breakdown by day
  • Average daily usage
  • Projected monthly usage
  • Warnings when approaching quota limits

Use Cases:

  • Monitor character quota consumption
  • Track usage patterns over time
  • Plan when to upgrade subscription tier
  • Avoid hitting quota limits unexpectedly
  • Identify high-usage periods

update-voices - Update Voice Table

Fetch latest voices from ElevenLabs API and update local lookup table.

Usage:

elevenlabs-tts-tool update-voices [--output PATH]

Arguments:

  • --output, -o PATH: Output file path (default: ~/.config/elevenlabs-tts-tool/voices_lookup.json)

Examples:

# Update default voice lookup (user config directory)
elevenlabs-tts-tool update-voices

# Save to custom location
elevenlabs-tts-tool update-voices --output custom_voices.json

# Update before listing voices
elevenlabs-tts-tool update-voices && elevenlabs-tts-tool list-voices

Behavior:

  • Fetches all premade voices from ElevenLabs API
  • Saves to user config directory by default (~/.config/elevenlabs-tts-tool/)
  • Creates config directory if it doesn't exist
  • Updates take precedence over package default
  • Persists across package reinstalls

pricing - Show Pricing Information

Display ElevenLabs pricing tiers and feature comparison.

Usage:

elevenlabs-tts-tool pricing

Examples:

# View full pricing table
elevenlabs-tts-tool pricing

# Find specific tier information
elevenlabs-tts-tool pricing | grep Creator
elevenlabs-tts-tool pricing | grep "44.1kHz PCM"

Output Information:

  • Pricing tiers (Free, Starter, Creator, Pro, Scale, Business)
  • Minutes included per tier
  • Additional minute costs
  • Audio quality options
  • Concurrency limits
  • Priority levels
  • API formats by tier
  • Model cost multipliers

Key Insights:

  • Free tier: 10,000-20,000 characters/month
  • v3 models cost 2x (half the minutes/tokens)
  • Use Flash v2.5 for high-volume integrations
  • Reserve v3 for content requiring emotional expression
  • PCM 44.1kHz requires Pro tier

completion - Shell Completion

Generate shell completion scripts for bash, zsh, or fish.

Usage:

elevenlabs-tts-tool completion [bash|zsh|fish]

Installation:

# Bash (add to ~/.bashrc)
eval "$(elevenlabs-tts-tool completion bash)"

# Zsh (add to ~/.zshrc)
eval "$(elevenlabs-tts-tool completion zsh)"

# Fish (save to completion file)
elevenlabs-tts-tool completion fish > ~/.config/fish/completions/elevenlabs-tts-tool.fish

Features:

  • Tab-complete commands and subcommands
  • Tab-complete options and flags
  • Context-aware completion for file paths

Emotion Control (v3 Models)

ElevenLabs v3 model (eleven_v3) supports Audio Tags for emotional expression.

Available Emotion Tags:

  • Basic emotions: [happy], [excited], [sad], [angry], [nervous], [curious]
  • Delivery styles: [cheerfully], [playfully], [mischievously], [resigned tone], [flatly], [deadpan]
  • Speech characteristics: [whispers], [laughs], [gasps], [sighs], [pauses], [hesitates], [stammers], [gulps]

Usage Examples:

# Basic emotion (requires eleven_v3 model)
elevenlabs-tts-tool synthesize "[happy] Welcome to our service!" --model eleven_v3

# Multiple emotions in sequence
elevenlabs-tts-tool synthesize "[excited] Great news! [cheerfully] Your project is approved!" --model eleven_v3

# Combine emotions with pauses
elevenlabs-tts-tool synthesize "[happy] Hello! <break time=\"0.5s\" /> [curious] How are you today?" --model eleven_v3

# Whispered speech
elevenlabs-tts-tool synthesize "[whispers] This is a secret message." --model eleven_v3

# Playful delivery
elevenlabs-tts-tool synthesize "[playfully] Guess what I found!" --model eleven_v3

Best Practices:

  • Place tags at the beginning of phrases
  • Align text content with emotional intent
  • Test with different voices for best results
  • Use sparingly - let AI infer emotion from context when possible
  • Remember: v3 models cost 2x as much (half the minutes/tokens)

Pause Control (SSML)

Add natural pauses using SSML <break> tags.

Syntax:

<break time="X.Xs" />

Examples:

# 1-second pause
elevenlabs-tts-tool synthesize "Welcome <break time=\"1.0s\" /> to our service."

# Multiple pauses
elevenlabs-tts-tool synthesize "Point one <break time=\"0.5s\" /> Point two <break time=\"0.5s\" /> Point three."

# Short pause for emphasis
elevenlabs-tts-tool synthesize "Think about this <break time=\"0.3s\" /> carefully."

# Combine with emotions (requires eleven_v3)
elevenlabs-tts-tool synthesize "[happy] Hello! <break time=\"0.5s\" /> [cheerfully] How are you?" --model eleven_v3

Limitations:

  • Maximum pause duration: 3 seconds
  • Recommended: 2-4 breaks per generation
  • Too many breaks can cause:
    • AI speedup
    • Audio artifacts
    • Background noise
    • Generation instability

Alternative Methods:

  • Dashes (- or ) for shorter pauses (less consistent)
  • Ellipses (...) for hesitation (may add nervous tone)
  • SSML <break> is most reliable

Verbosity Control

Multi-level verbosity for progressive detail control.

Verbosity Levels:

  • No flag (default): WARNING level - only critical issues
  • -v: INFO level - high-level operations, important events
  • -vv: DEBUG level - detailed operations, API calls, validation steps
  • -vvv: TRACE level - full HTTP requests/responses, ElevenLabs SDK internals

Usage:

# Quiet mode (warnings only)
elevenlabs-tts-tool synthesize "Hello world"

# INFO level
elevenlabs-tts-tool -v synthesize "Hello world"

# DEBUG level (detailed operations)
elevenlabs-tts-tool -vv synthesize "Hello world"

# TRACE level (shows HTTP requests/responses)
elevenlabs-tts-tool -vvv synthesize "Hello world"

Dependent Library Logging: At trace level (-vvv), the following libraries enable DEBUG logging:

  • elevenlabs - ElevenLabs SDK internals
  • httpx / httpcore - HTTP request/response details
  • urllib3 - Low-level HTTP operations

Pipeline Integration

The tool is designed for composition with other CLI tools.

Design Principles:

  • JSON output to stdout, logs/errors to stderr
  • Stdin support for text input
  • Exit codes for success/failure detection
  • Shell completion for productivity

Examples:

# Read from file
cat document.txt | elevenlabs-tts-tool synthesize --stdin --output audio.mp3

# Combine with other tools
gemini-google-search-tool query "AI news" | \
    elevenlabs-tts-tool synthesize --stdin --voice charlotte --output news.mp3

# Conditional execution
make build && elevenlabs-tts-tool synthesize "[cheerfully] Build successful!" || \
    elevenlabs-tts-tool synthesize "[sad] Build failed. Check logs."

# Process multiple texts
for text in "First" "Second" "Third"; do
    elevenlabs-tts-tool synthesize "$text" --output "${text}.mp3"
done

Claude Code Integration

Use elevenlabs-tts-tool as notification system for Claude Code hooks.

Use Cases:

  1. Task Completion Alerts
# After long-running task
elevenlabs-tts-tool synthesize "[excited] Task completed successfully!"
  1. Error Notifications
# On build failure
elevenlabs-tts-tool synthesize "[nervous] Error detected. Please check output."
  1. Custom Workflows
# Shell script integration
make build && elevenlabs-tts-tool synthesize "[cheerfully] Build successful!" || \
    elevenlabs-tts-tool synthesize "[sad] Build failed. Check logs."
  1. Multi-Tool Integration
# Combine with other CLI tools
gemini-google-search-tool query "AI news" | \
    elevenlabs-tts-tool synthesize --stdin --voice charlotte --output news.mp3

Hook Configuration:

Create hooks in ~/.config/claude-code/hooks.json:

{
  "hooks": {
    "after_command": {
      "type": "bash",
      "command": "elevenlabs-tts-tool synthesize \"[happy] Task completed!\" --voice rachel"
    },
    "on_error": {
      "type": "bash",
      "command": "elevenlabs-tts-tool synthesize \"[nervous] Error occurred!\" --voice adam"
    }
  }
}

Benefits:

  • Audio alerts for completed tasks without monitoring terminal
  • Error notifications while away from screen
  • Multi-step automation with voice feedback
  • Voice-enabled AI agent pipelines

Common Issues

Issue: "API key not found" error

# Symptom
Error: ELEVENLABS_API_KEY environment variable not set

Solution:

  1. Get API key from https://elevenlabs.io/app/settings/api-keys
  2. Export as environment variable:
    export ELEVENLABS_API_KEY='your-api-key'
    
  3. Add to shell profile for persistence:
    echo 'export ELEVENLABS_API_KEY="your-api-key"' >> ~/.bashrc
    source ~/.bashrc
    

Issue: "Voice not found" error

# Symptom
ValueError: Voice 'unknown' not found in lookup table

Solution:

  1. List available voices:
    elevenlabs-tts-tool list-voices
    
  2. Update voice table if needed:
    elevenlabs-tts-tool update-voices
    
  3. Use correct voice name (case-insensitive):
    elevenlabs-tts-tool synthesize "Hello" --voice rachel
    

Issue: Character quota exceeded

# Symptom
Error: Character quota exceeded for this month

Solution:

  1. Check current usage:
    elevenlabs-tts-tool info
    
  2. Wait until quota reset date
  3. Consider upgrading subscription tier:
    elevenlabs-tts-tool pricing
    
  4. Use more efficient models (Flash/Turbo vs v3)

Issue: Audio quality issues

Symptom: Poor audio quality or artifacts

Solution:

  1. Try different output format:
    elevenlabs-tts-tool synthesize "Text" --format mp3_44100_128
    
  2. Use higher-quality model:
    elevenlabs-tts-tool synthesize "Text" --model eleven_multilingual_v2
    
  3. For professional content, use PCM format (requires Pro tier):
    elevenlabs-tts-tool synthesize "Text" --format pcm_44100
    

Issue: Emotional tags not working

Symptom: Emotion tags like [happy] are spoken literally

Solution:

  1. Ensure using v3 model:
    elevenlabs-tts-tool synthesize "[happy] Text" --model eleven_v3
    
  2. Place tags at beginning of phrases
  3. Test with different voices (some work better than others)

Issue: Too many SSML breaks causing issues

Symptom: Audio artifacts, speedup, or noise with multiple <break> tags

Solution:

  1. Limit to 2-4 breaks per generation
  2. Use maximum 3 seconds per break
  3. Consider splitting into multiple generations:
    elevenlabs-tts-tool synthesize "Part 1" --output part1.mp3
    elevenlabs-tts-tool synthesize "Part 2" --output part2.mp3
    

Getting Help

# Main help
elevenlabs-tts-tool --help

# Command-specific help
elevenlabs-tts-tool synthesize --help
elevenlabs-tts-tool list-voices --help
elevenlabs-tts-tool info --help

# Version information
elevenlabs-tts-tool --version

Additional Resources:

Free Tier Limitations

ElevenLabs Free Tier (2024-2025):

  • ✅ 10,000-20,000 characters per month
  • ✅ All 42 premade voices
  • ✅ Create up to 3 custom voices
  • ✅ MP3 formats (all bitrates)
  • ✅ Basic SSML support (<break>, phonemes)
  • ✅ Emotional tags (v3 models)
  • ✅ Full API access
  • ❌ No commercial license (personal/experimentation only)
  • ❌ PCM 44.1kHz format (requires Pro tier)
  • ⚠️ Max 2,500 characters per single generation

Upgrade Tiers:

  • Starter ($5/month): 30,000 characters, commercial license
  • Creator ($22/month): 100,000 characters, PCM formats
  • Pro ($99/month): 500,000 characters, PCM 44.1kHz, highest priority
  • Scale ($330/month): 2,000,000 characters
  • Business (custom): Custom limits and features

Rate Limits: Not publicly documented - expect reasonable use restrictions on free tier

Exit Codes

  • 0: Success
  • 1: General error (validation, API error, etc.)

Output Formats

Audio Formats:

  • mp3_44100_128: MP3, 44.1kHz, 128kbps (default, best quality)
  • mp3_44100_64: MP3, 44.1kHz, 64kbps (good quality, smaller)
  • mp3_22050_32: MP3, 22.05kHz, 32kbps (acceptable quality, smallest)
  • pcm_44100: PCM WAV, 44.1kHz, uncompressed (requires Pro tier)

Text Formats:

  • Human-readable tables for list commands
  • Structured output with clear sections
  • Errors to stderr, audio/data to stdout

Best Practices

  1. Use Turbo v2.5 for High Volume: Default model offers best value (1x cost, fast, high quality)
  2. Reserve v3 for Emotional Content: Use v3 only when emotion tags needed (costs 2x)
  3. Monitor Quota Regularly: Check info command before large generations
  4. Update Voices Periodically: Run update-voices monthly to get latest voices
  5. Test Voices for Your Use Case: Different voices work better for different content types
  6. Use SSML Breaks Sparingly: Limit to 2-4 breaks per generation for stability
  7. Pipeline for Efficiency: Combine with other tools for automated workflows
  8. Set Verbosity Appropriately: Use -vv or -vvv for debugging, default for production

Resources

Install

Download ZIP
Requires askill CLI v1.0+

AI Quality Score

98/100Analyzed 2/7/2026

An exceptionally well-documented skill for the ElevenLabs TTS CLI tool. It provides clear installation steps, comprehensive command references, and advanced usage examples like emotion tags and SSML.

95
100
95
100
100

Metadata

Licenseunknown
Version-
Updated2/5/2026
Publishermajiayu000

Tags

apici-cdgithubllmobservabilitysecuritytesting