askill
audio-mixing-patterns

audio-mixing-patternsSafety 100Repository

ffmpeg audio mixing patterns for video production. Use when mixing narration with music, implementing ducking, or balancing volume levels for demos

7 stars
1.2k downloads
Updated 2/8/2026

Package Files

Loading files...
SKILL.md

Audio Mixing Patterns

Comprehensive guide to audio mixing for video production using ffmpeg. Covers narration/music balancing, automatic ducking, timing control, and loudness normalization.

Core Principle

Quality Audio = Clear Narration + Supportive Music + Appropriate Levels

The human voice occupies 85-255 Hz (fundamental) with harmonics up to 8kHz. Music must support, not compete.

Volume Balancing Formula

Standard Video Mix Ratios:
--------------------------
Narration:  100% (reference level)
Music:      15-20% of narration level
SFX:        70-100% of narration level (contextual)

dB Relationships:
-----------------
Narration:  -14 dB LUFS (dialogue standard)
Music bed:  -30 to -35 dB LUFS (under narration)
Music only: -16 dB LUFS (no narration sections)
SFX:        -18 to -20 dB LUFS

Volume Multiplier Quick Reference

RatioMultiplierUse Case
100%1.0Full volume (narration)
70%0.7Prominent SFX
50%0.5Equal blend
30%0.3Noticeable background
20%0.2Subtle bed (recommended music)
15%0.15Minimal presence
10%0.1Barely audible

Basic ffmpeg Mixing Commands

Two-Track Mix (Narration + Music)

# Basic mix: narration at full, music at 15%
ffmpeg -i narration.mp3 -i music.mp3 \
  -filter_complex "[0:a]volume=1.0[narr];[1:a]volume=0.15[music];[narr][music]amix=inputs=2:duration=first" \
  -c:a aac -b:a 192k output.m4a

Three-Track Mix (Narration + Music + SFX)

ffmpeg -i narration.mp3 -i music.mp3 -i sfx.mp3 \
  -filter_complex "\
    [0:a]volume=1.0[narr];\
    [1:a]volume=0.15[music];\
    [2:a]volume=0.7[sfx];\
    [narr][music][sfx]amix=inputs=3:duration=first:weights='3 1 2'" \
  -c:a aac -b:a 192k output.m4a

Timing with adelay Filter

The adelay filter positions audio at precise timestamps.

Syntax

adelay=delays[|delays...][,all=1]
# delays: milliseconds or samples (with 'S' suffix)
# all=1: apply same delay to all channels

Position Music at Specific Time

# Start music at 5 seconds
ffmpeg -i narration.mp3 -i music.mp3 \
  -filter_complex "\
    [0:a]volume=1.0[narr];\
    [1:a]adelay=5000|5000,volume=0.15[music];\
    [narr][music]amix=inputs=2:duration=first" \
  output.m4a

Multiple Timed Audio Cues

# Narration starts at 0, music at 2s, SFX at 5.5s
ffmpeg -i narration.mp3 -i music.mp3 -i sfx.wav \
  -filter_complex "\
    [0:a]volume=1.0[narr];\
    [1:a]adelay=2000|2000,volume=0.15[music];\
    [2:a]adelay=5500|5500,volume=0.7[sfx];\
    [narr][music][sfx]amix=inputs=3:duration=longest" \
  output.m4a

Audio Ducking

Automatically lower music when speech is present.

Simple Sidechain Compression (Ducking)

ffmpeg -i narration.mp3 -i music.mp3 \
  -filter_complex "\
    [0:a]asplit=2[narr][sc];\
    [1:a][sc]sidechaincompress=threshold=0.02:ratio=10:attack=50:release=500[ducked];\
    [narr][ducked]amix=inputs=2:duration=first" \
  output.m4a

Parameters Explained

ParameterValueEffect
threshold0.02 (default 0.125)Lower = more sensitive to speech
ratio10:1How much to reduce (10:1 = significant duck)
attack50msHow fast to duck when speech starts
release500msHow fast to return after speech ends
knee2.82843Softness of compression curve

Advanced Ducking with Precise Control

ffmpeg -i narration.mp3 -i music.mp3 \
  -filter_complex "\
    [0:a]asplit=2[narr][sc];\
    [1:a]volume=0.5[music_pre];\
    [music_pre][sc]sidechaincompress=\
      threshold=0.015:\
      ratio=15:\
      attack=30:\
      release=800:\
      makeup=1:\
      knee=6[ducked];\
    [narr][ducked]amix=inputs=2:duration=first:weights='1 0.4'" \
  output.m4a

Mix Ratios by Content Type

Content Type          | Narration | Music | SFX  | Notes
---------------------|-----------|-------|------|------------------
Tutorial/How-to      | 100%      | 10%   | 50%  | Voice clarity critical
Corporate/Business   | 100%      | 15%   | 60%  | Professional presence
Social Media         | 100%      | 20%   | 80%  | Higher energy
Documentary          | 100%      | 25%   | 100% | Cinematic feel
Promo/Advertising    | 100%      | 30%   | 100% | Impactful
Music Video          | 50%       | 100%  | 80%  | Music dominant
Podcast              | 100%      | 5%    | 30%  | Minimal distraction
E-learning           | 100%      | 8%    | 40%  | Focus on retention

Loudness Normalization (LUFS)

LUFS (Loudness Units Full Scale) is the broadcast standard for perceived loudness.

Target Levels by Platform

PlatformTarget LUFSTrue PeakNotes
YouTube-14 LUFS-1 dB TPAuto-normalized
Spotify-14 LUFS-1 dB TPLoudness penalty applied
Apple Music-16 LUFS-1 dB TPSound Check
Broadcast TV-24 LUFS-2 dB TPEBU R128 standard
Podcast-16 to -19 LUFS-1 dB TPApple spec
TikTok/Reels-14 LUFS-1 dB TPMobile optimization

Loudness Normalization Command

# Normalize to -14 LUFS (YouTube/Spotify standard)
ffmpeg -i input.mp3 \
  -af loudnorm=I=-14:TP=-1:LRA=11 \
  -c:a aac -b:a 192k output.m4a

Two-Pass Normalization (More Accurate)

# Pass 1: Analyze
ffmpeg -i input.mp3 \
  -af loudnorm=I=-14:TP=-1:LRA=11:print_format=json \
  -f null - 2>&1 | grep -A 12 "output_i"

# Pass 2: Apply measured values
ffmpeg -i input.mp3 \
  -af loudnorm=I=-14:TP=-1:LRA=11:\
measured_I=-18.5:measured_TP=-2.3:measured_LRA=8.2:\
measured_thresh=-28.5:\
linear=true \
  -c:a aac -b:a 192k output.m4a

Multi-Track Production Pipeline

Complete Video Audio Mix

ffmpeg -i video.mp4 -i narration.wav -i music.mp3 -i sfx.wav \
  -filter_complex "\
    [1:a]volume=1.0,aformat=sample_fmts=fltp:sample_rates=48000:channel_layouts=stereo[narr];\
    [2:a]volume=0.15,aformat=sample_fmts=fltp:sample_rates=48000:channel_layouts=stereo[music];\
    [3:a]adelay=3000|3000,volume=0.7,aformat=sample_fmts=fltp:sample_rates=48000:channel_layouts=stereo[sfx];\
    [narr][music][sfx]amix=inputs=3:duration=first:normalize=0[mixed];\
    [mixed]loudnorm=I=-14:TP=-1:LRA=11[final]" \
  -map 0:v -map "[final]" \
  -c:v copy -c:a aac -b:a 192k \
  output.mp4

Audio-Only Master Mix

ffmpeg -i narration.wav -i music.mp3 -i intro_sfx.wav -i outro_sfx.wav \
  -filter_complex "\
    [0:a]volume=1.0[narr];\
    [1:a]volume=0.15[music];\
    [2:a]adelay=0|0,volume=0.8[intro];\
    [3:a]adelay=55000|55000,volume=0.8[outro];\
    [narr][music][intro][outro]amix=inputs=4:duration=longest:weights='3 1 2 2'[mix];\
    [mix]loudnorm=I=-14:TP=-1[final]" \
  -map "[final]" -c:a aac -b:a 256k master_audio.m4a

Quick Reference: Common Patterns

Pattern 1: Narration + Background Music

ffmpeg -i narration.mp3 -i music.mp3 \
  -filter_complex "[0:a]volume=1.0[n];[1:a]volume=0.15[m];[n][m]amix=inputs=2:duration=first" \
  output.m4a

Pattern 2: Music with Auto-Duck

ffmpeg -i narration.mp3 -i music.mp3 \
  -filter_complex "[0:a]asplit=2[n][sc];[1:a][sc]sidechaincompress=threshold=0.02:ratio=10[d];[n][d]amix=inputs=2" \
  output.m4a

Pattern 3: Timed Intro Music Fade

ffmpeg -i narration.mp3 -i intro_music.mp3 \
  -filter_complex "\
    [1:a]afade=t=out:st=8:d=2,volume=0.3[intro];\
    [0:a]adelay=10000|10000[narr];\
    [intro][narr]amix=inputs=2:duration=longest" \
  output.m4a

Pattern 4: Crossfade Between Segments

ffmpeg -i segment1.mp3 -i segment2.mp3 \
  -filter_complex "\
    [0:a]afade=t=out:st=28:d=2[s1];\
    [1:a]adelay=28000|28000,afade=t=in:d=2[s2];\
    [s1][s2]amix=inputs=2:duration=longest" \
  output.m4a

Troubleshooting

IssueCauseSolution
Clipping/distortionCombined levels too highReduce individual volumes or add limiter
Narration buriedMusic too loudReduce music to 10-15%, add ducking
Hollow/thin soundPhase cancellationCheck mono compatibility
Pumping artifactsAggressive duckingIncrease attack/release times
Inconsistent levelsNo normalizationApply loudnorm filter

Add Limiter to Prevent Clipping

ffmpeg -i input.mp3 \
  -af "alimiter=level_in=1:level_out=0.9:limit=0.95:attack=5:release=50" \
  output.m4a

Related Skills

  • video-pacing: Video rhythm and timing patterns
  • remotion-composer: Programmatic video generation
  • demo-producer: Product demo video production
  • thumbnail-first-frame: Video thumbnail optimization

References

Install

Download ZIP
Requires askill CLI v1.0+

AI Quality Score

98/100Analyzed 2/9/2026

An exceptional technical reference for ffmpeg audio mixing. It provides highly actionable commands, clear industry standards for volume levels, and advanced techniques like sidechain ducking and LUFS normalization.

100
95
95
95
100

Metadata

Licenseunknown
Version1.0.0
Updated2/8/2026
PublisherNeverSight

Tags

ci-cd