Comprehensive guide to GPU-accelerated encoding and decoding with NVIDIA, Intel, AMD, Apple, and Vulkan.
ffmpeg-hardware-acceleration-2025 follows the SKILL.md standard. Use the install command to add it to your agent stack.
---
name: ffmpeg-hardware-acceleration
description: Complete GPU-accelerated encoding/decoding system for FFmpeg 7.1 LTS and 8.0.1 (latest stable, released 2025-11-20). PROACTIVELY activate for: (1) NVIDIA NVENC/NVDEC encoding, (2) Intel Quick Sync Video (QSV), (3) AMD AMF encoding, (4) Apple VideoToolbox, (5) Linux VAAPI setup, (6) Vulkan Video 8.0 (FFv1, AV1, VP9, ProRes RAW), (7) VVC/H.266 hardware decoding (VAAPI/QSV), (8) GPU pipeline optimization with pad_cuda, (9) Docker GPU containers, (10) Performance benchmarking. Provides: Platform-specific commands, preset comparisons, quality tuning, full GPU pipeline examples, Vulkan compute codecs, VVC decoding, troubleshooting guides. Ensures: Maximum encoding speed with optimal quality using GPU acceleration.
---
## CRITICAL GUIDELINES
### Windows File Path Requirements
**MANDATORY: Always Use Backslashes on Windows for File Paths**
When using Edit or Write tools on Windows, you MUST use backslashes (`\`) in file paths, NOT forward slashes (`/`).
---
## Quick Reference
| Platform | Encoder | Decoder | Detect Command |
|----------|---------|---------|----------------|
| NVIDIA | `h264_nvenc`, `hevc_nvenc`, `av1_nvenc` | `h264_cuvid`, `hevc_cuvid` | `ffmpeg -encoders \| grep nvenc` |
| Intel QSV | `h264_qsv`, `hevc_qsv`, `av1_qsv` | `h264_qsv`, `hevc_qsv` | `ffmpeg -encoders \| grep qsv` |
| AMD AMF | `h264_amf`, `hevc_amf`, `av1_amf` | N/A (use software) | `ffmpeg -encoders \| grep amf` |
| Apple | `h264_videotoolbox`, `hevc_videotoolbox` | `h264_videotoolbox` | macOS only |
| VAAPI | `h264_vaapi`, `hevc_vaapi`, `av1_vaapi` | with `-hwaccel vaapi` | Linux only |
## When to Use This Skill
Use when **GPU acceleration is needed**:
- Encoding speed is critical (10-30x faster than CPU)
- Processing large batches of videos
- Real-time encoding for streaming
- Server-side transcoding at scale
- Docker containers with GPU passthrough
**Key decision**: GPU encoding trades some quality for massive speed. Use `-cq` or `-qp` for quality control.
---
# FFmpeg Hardware Acceleration (2025)
Comprehensive guide to GPU-accelerated encoding and decoding with NVIDIA, Intel, AMD, Apple, and Vulkan.
**Current Latest**: FFmpeg 8.0.1 (released 2025-11-20) - Check with `ffmpeg -version`
## Hardware Acceleration Overview
Hardware acceleration uses dedicated GPU/SoC components for video processing:
- **NVENC/NVDEC** (NVIDIA): Dedicated video encode/decode engines
- **QSV** (Intel): Quick Sync Video on Intel CPUs with integrated graphics
- **AMF** (AMD): Advanced Media Framework for AMD GPUs
- **VideoToolbox** (Apple): macOS/iOS hardware acceleration
- **VAAPI** (Linux): Video Acceleration API (Intel, AMD on Linux)
- **Vulkan Video** (Cross-platform): FFmpeg 7.1+/8.0 GPU acceleration
### Performance Comparison (2025 Benchmarks)
| Method | Speed | Quality | Power | Use Case |
|--------|-------|---------|-------|----------|
| libx264 (CPU) | 1x | Best | High | Quality-critical |
| libx265 (CPU) | 0.3x | Best | Very High | Archival |
| h264_nvenc | 10-20x | Good | Low | Real-time, streaming |
| hevc_nvenc | 8-15x | Good | Low | 4K streaming |
| h264_qsv | 8-15x | Good | Very Low | Laptop, efficiency |
| h264_amf | 8-15x | Good | Low | AMD systems |
## NVIDIA NVENC/NVDEC
### Requirements
- NVIDIA GPU (GTX 600+ / Quadro K series+)
- NVIDIA drivers 450+
- FFmpeg built with `--enable-nvenc --enable-cuda --enable-cuvid`
### Check NVIDIA Support
```bash
# Check available NVIDIA codecs
ffmpeg -encoders | grep nvenc
ffmpeg -decoders | grep cuvid
# Check GPU info
nvidia-smi
# List hardware accelerators
ffmpeg -hwaccels
```
### Basic NVENC Encoding
```bash
# H.264 NVENC encoding
ffmpeg -i input.mp4 -c:v h264_nvenc -preset p4 -b:v 5M output.mp4
# H.265/HEVC NVENC encoding
ffmpeg -i input.mp4 -c:v hevc_nvenc -preset p4 -b:v 4M output.mp4
# AV1 NVENC (RTX 40 series+)
ffmpeg -i input.mp4 -c:v av1_nvenc -preset p4 -b:v 3M output.mp4
```
### NVENC Presets (FFmpeg 7+)
| Preset | Speed | Quality | Use Case |
|--------|-------|---------|----------|
| p1 | Fastest | Lowest | Real-time capture |
| p2 | Faster | Low | Screen recording |
| p3 | Fast | Medium | General streaming |
| p4 | Medium | Good | **Recommended** |
| p5 | Slow | Better | High-quality streaming |
| p6 | Slower | Best | Offline encoding |
| p7 | Slowest | Highest | Maximum quality |
### Full GPU Pipeline (Decode + Encode)
```bash
# Keep frames in GPU memory (fastest)
ffmpeg -y -vsync 0 \
-hwaccel cuda \
-hwaccel_output_format cuda \
-i input.mp4 \
-c:v h264_nvenc \
-preset p4 \
-b:v 5M \
-c:a copy \
output.mp4
```
### GPU Scaling and Filtering
```bash
# GPU-based scaling with scale_cuda
ffmpeg -y -vsync 0 \
-hwaccel cuda \
-hwaccel_output_format cuda \
-i input.mp4 \
-vf scale_cuda=1280:720 \
-c:v h264_nvenc \
-preset p4 \
output.mp4
# GPU overlay with overlay_cuda
ffmpeg -hwaccel cuda -hwaccel_output_format cuda -i main.mp4 \
-hwaccel cuda -hwaccel_output_format cuda -i overlay.mp4 \
-filter_complex "[0:v][1:v]overlay_cuda=10:10" \
-c:v h264_nvenc output.mp4
# GPU padding with pad_cuda (FFmpeg 8.0+)
ffmpeg -hwaccel cuda -hwaccel_output_format cuda -i input.mp4 \
-vf "pad_cuda=1920:1080:(ow-iw)/2:(oh-ih)/2:black" \
-c:v h264_nvenc output.mp4
# Letterbox with pad_cuda
ffmpeg -hwaccel cuda -hwaccel_output_format cuda -i input.mp4 \
-vf "scale_cuda=1920:-2,pad_cuda=1920:1080:(ow-iw)/2:(oh-ih)/2" \
-c:v h264_nvenc output.mp4
```
### CUDA Filters Reference (FFmpeg 8.0+)
| Filter | Description | Key Parameters |
|--------|-------------|----------------|
| `scale_cuda` | GPU video scaling | `w`, `h`, `format`, `interp_algo` |
| `scale_npp` | NPP-based scaling (higher quality) | `w`, `h`, `interp_algo` |
| `overlay_cuda` | GPU overlay/compositing | `x`, `y`, `eof_action` |
| `pad_cuda` | GPU padding (8.0+) | `w`, `h`, `x`, `y`, `color` |
| `chromakey_cuda` | GPU chroma keying | `color`, `similarity`, `blend` |
| `colorspace_cuda` | Color space conversion | `all`, `space`, `trc`, `primaries` |
| `bilateral_cuda` | Bilateral filter | `sigmaS`, `sigmaR`, `window_size` |
| `bwdif_cuda` | Deinterlacing | `mode`, `parity`, `deint` |
| `hwupload_cuda` | Upload to GPU | - |
| `hwdownload` | Download from GPU | - |
### scale_npp (NVIDIA Performance Primitives)
For higher-quality scaling, use scale_npp with optimized interpolation algorithms:
```bash
# High-quality downscaling with super-sampling
ffmpeg -hwaccel cuda -hwaccel_output_format cuda \
-i input.mp4 \
-vf "scale_npp=1280:720:interp_algo=super" \
-c:v h264_nvenc output.mp4
# Lanczos interpolation for upscaling
ffmpeg -hwaccel cuda -hwaccel_output_format cuda \
-i input.mp4 \
-vf "scale_npp=1920:1080:interp_algo=lanczos" \
-c:v h264_nvenc output.mp4
```
**scale_npp Interpolation Algorithms:**
| Algorithm | Quality | Speed | Best For |
|-----------|---------|-------|----------|
| `nn` | Low | Fastest | Pixel art, nearest neighbor |
| `linear` | Medium | Fast | General use |
| `cubic` | Good | Medium | Smooth scaling |
| `lanczos` | High | Slow | High quality upscaling |
| `super` | Best | Slowest | Downscaling |
### GPU Chromakey (Green Screen)
```bash
# Green screen removal on GPU
ffmpeg -hwaccel cuda -hwaccel_output_format cuda -i greenscreen.mp4 \
-hwaccel cuda -hwaccel_output_format cuda -i background.mp4 \
-filter_complex "[0:v]chromakey_cuda=0x00FF00:0.3:0.1[fg];[1:v][fg]overlay_cuda" \
-c:v h264_nvenc output.mp4
```
### Hybrid GPU + CPU Filter Pipelines
When mixing CPU and GPU filters, use `hwupload_cuda` and `hwdownload`:
```bash
# CPU decode -> CPU filter -> GPU encode
ffmpeg -i input.mp4 \
-vf "fade=in:0:30,hwupload_cuda" \
-c:v h264_nvenc output.mp4
# GPU decode -> CPU filter (drawtext) -> GPU encode
ffmpeg -hwaccel cuda -hwaccel_output_format cuda \
-i input.mp4 \
-vf "hwdownload,format=nv12,drawtext=text='Hello':fontsize=48,hwupload_cuda" \
-c:v h264_nvenc output.mp4
# Complete hybrid pipeline: GPU scale -> CPU text -> GPU encode
ffmpeg -hwaccel cuda -hwaccel_output_format cuda -i input.mp4 \
-filter_complex "\
[0:v]scale_cuda=1920:1080[scaled];\
[scaled]hwdownload,format=nv12[cpu];\
[cpu]drawtext=text='Watermark':fontsize=48:x=10:y=10[text];\
[text]hwupload_cuda[gpu]" \
-map "[gpu]" \
-c:v h264_nvenc output.mp4
```
### NVENC Quality Optimization
```bash
# High quality with lookahead
ffmpeg -i input.mp4 \
-c:v hevc_nvenc \
-preset p5 \
-tune hq \
-rc vbr \
-cq 23 \
-b:v 0 \
-rc-lookahead 32 \
-spatial-aq 1 \
-temporal-aq 1 \
-c:a copy \
output.mp4
# Constant quality mode (CQP)
ffmpeg -i input.mp4 \
-c:v h264_nvenc \
-preset p4 \
-rc constqp \
-qp 23 \
-c:a copy \
output.mp4
```
### NVENC Two-Pass Encoding
```bash
# Two-pass for best quality (ABR)
ffmpeg -i input.mp4 \
-c:v h264_nvenc \
-preset p5 \
-2pass 1 \
-b:v 5M \
-c:a copy \
output.mp4
```
### NVENC B-Frames and GOP
```bash
# Enable B-frames for better compression
ffmpeg -i input.mp4 \
-c:v hevc_nvenc \
-preset p4 \
-bf 4 \
-b_ref_mode 2 \
-g 250 \
-c:a copy \
output.mp4
```
## GPU Memory Management
### Critical Concepts
PCIe transfers between CPU and GPU memory are the primary bottleneck. Keeping frames in GPU memory throughout the pipeline is critical for performance.
**Memory Flow Patterns:**
```
Pattern 1: Full GPU Pipeline (OPTIMAL)
Input File -> GPU Decode -> GPU Filter -> GPU Encode -> Output File
| | |
v v v
[GPU Memory - No PCIe Transfer]
Pattern 2: CPU Filter Insertion (SUBOPTIMAL)
Input -> GPU Decode -> hwdownload -> CPU Filter -> hwupload -> GPU Encode -> Output
| |
v v
[PCIe Transfer] [PCIe Transfer]
Pattern 3: Software Decode + GPU Encode
Input -> CPU Decode -> hwupload -> GPU Encode -> Output
|
v
[PCIe Transfer]
```
### Command Patterns Comparison
```bash
# OPTIMAL: Full GPU pipeline - no CPU-GPU transfers
ffmpeg -y -vsync 0 \
-hwaccel cuda \
-hwaccel_output_format cuda \
-i input.mp4 \
-vf scale_cuda=1280:720 \
-c:v h264_nvenc \
output.mp4
# SUBOPTIMAL: GPU decode, CPU filter, GPU encode - 2 transfers
ffmpeg -hwaccel cuda -hwaccel_output_format cuda \
-i input.mp4 \
-vf "hwdownload,format=nv12,drawtext=text='Hello',hwupload_cuda" \
-c:v h264_nvenc output.mp4
# AVOID: Missing -hwaccel_output_format - frame copies to CPU after decode
ffmpeg -hwaccel cuda -i input.mp4 \
-vf scale=1280:720 \
-c:v h264_nvenc output.mp4
```
**Why `-hwaccel_output_format cuda` matters:** Without it, decoded frames transfer from GPU to CPU via PCIe, get processed, then transfer back to GPU for encoding. This can reduce throughput by up to 50%.
### Memory Monitoring
```bash
# NVIDIA GPU memory and utilization
nvidia-smi dmon -s m
# Watch GPU utilization in real-time
watch -n 1 nvidia-smi
# Intel GPU monitoring
intel_gpu_top
```
### Best Practices
1. **Use `-hwaccel_output_format`** to keep decoded frames in GPU memory
2. **Use GPU filters when available** (scale_cuda, overlay_cuda, pad_cuda)
3. **Batch similar operations** to minimize filter graph complexity
4. **Monitor GPU memory** for high-resolution content
5. **Use `-vsync 0`** to prevent frame duplication overhead
## Multi-GPU Encoding
### NVIDIA Multi-GPU
```bash
# Specify GPU by index (parallel processing on different GPUs)
ffmpeg -hwaccel cuda \
-hwaccel_device 0 \
-hwaccel_output_format cuda \
-i input1.mp4 \
-c:v h264_nvenc output1.mp4 &
ffmpeg -hwaccel cuda \
-hwaccel_device 1 \
-hwaccel_output_format cuda \
-i input2.mp4 \
-c:v h264_nvenc output2.mp4 &
wait
# Initialize specific CUDA device for filters
ffmpeg -hwaccel cuda -hwaccel_output_format cuda -i input.mp4 \
-init_hw_device cuda=gpu1:1 \
-filter_complex "[0:v]scale_cuda=1280:720[scaled]" \
-map "[scaled]" \
-c:v h264_nvenc output.mp4
```
### 1:N Encoding (Single Input, Multiple Outputs)
More efficient than separate processes - shares CUDA context:
```bash
# Single input to multiple resolutions (ABR ladder)
ffmpeg -y -vsync 0 \
-hwaccel cuda \
-hwaccel_output_format cuda \
-i input.mp4 \
-filter_complex "[0:v]split=3[v1][v2][v3];\
[v1]scale_cuda=1920:1080[hd];\
[v2]scale_cuda=1280:720[sd];\
[v3]scale_cuda=640:360[mobile]" \
-map "[hd]" -c:v h264_nvenc -b:v 5M output_1080p.mp4 \
-map "[sd]" -c:v h264_nvenc -b:v 2M output_720p.mp4 \
-map "[mobile]" -c:v h264_nvenc -b:v 500k output_360p.mp4
```
### Parallel Encoding Strategy
For maximum throughput:
1. Run 4+ parallel FFmpeg sessions
2. Use inputs with 15+ seconds to amortize initialization
3. Measure aggregate FPS across all sessions
```bash
# Environment variables for faster initialization
export CUDA_VISIBLE_DEVICES=0
export CUDA_DEVICE_MAX_CONNECTIONS=2
# Parallel encoding script
for i in {1..4}; do
ffmpeg -hwaccel cuda -hwaccel_output_format cuda \
-i input_$i.mp4 \
-c:v h264_nvenc \
output_$i.mp4 &
done
wait
```
## Intel Quick Sync Video (QSV)
### Requirements
- Intel CPU with integrated graphics (Sandy Bridge+)
- Intel Media SDK or oneVPL
- FFmpeg built with `--enable-libmfx` or `--enable-libvpl`
### Check QSV Support
```bash
# Check available QSV codecs
ffmpeg -encoders | grep qsv
ffmpeg -decoders | grep qsv
# Verify Intel GPU access (Linux)
ls /dev/dri/
vainfo # Check VAAPI support
```
### Basic QSV Encoding
```bash
# Initialize QSV device
ffmpeg -init_hw_device qsv=hw \
-filter_hw_device hw \
-i input.mp4 \
-c:v h264_qsv \
-preset medium \
-b:v 5M \
output.mp4
# H.265/HEVC QSV
ffmpeg -init_hw_device qsv=hw \
-filter_hw_device hw \
-i input.mp4 \
-c:v hevc_qsv \
-preset medium \
-b:v 4M \
output.mp4
# AV1 QSV (Intel Arc, 12th gen+)
ffmpeg -init_hw_device qsv=hw \
-filter_hw_device hw \
-i input.mp4 \
-c:v av1_qsv \
-preset medium \
-b:v 3M \
output.mp4
```
### Full QSV Pipeline
```bash
# Decode and encode on GPU
ffmpeg -hwaccel qsv \
-hwaccel_output_format qsv \
-i input.mp4 \
-c:v h264_qsv \
-preset medium \
-b:v 5M \
output.mp4
```
### QSV Quality Settings
```bash
# Constant quality (recommended)
ffmpeg -hwaccel qsv -hwaccel_output_format qsv \
-i input.mp4 \
-c:v hevc_qsv \
-preset slow \
-global_quality 22 \
-look_ahead 1 \
output.mp4
# VBR with lookahead
ffmpeg -hwaccel qsv -hwaccel_output_format qsv \
-i input.mp4 \
-c:v h264_qsv \
-preset medium \
-b:v 5M \
-maxrate 7M \
-bufsize 10M \
-look_ahead 1 \
-look_ahead_depth 40 \
output.mp4
```
**QSV Quality Parameters:**
| Parameter | Description | Range |
|-----------|-------------|-------|
| `-global_quality` | Quality level (like CRF) | 1-51 (lower = better) |
| `-preset` | Encoding speed/quality | `veryfast`, `faster`, `fast`, `medium`, `slow`, `slower`, `veryslow` |
| `-look_ahead` | Enable lookahead | 0 or 1 |
| `-look_ahead_depth` | Lookahead frames | 10-100 |
### Multi-GPU QSV (Linux)
```bash
# Select specific GPU by render device
ffmpeg -init_hw_device qsv=hw:/dev/dri/renderD129 \
-filter_hw_device hw \
-i input.mp4 \
-c:v h264_qsv output.mp4
```
### QSV with Scaling
```bash
# GPU scaling with vpp_qsv
ffmpeg -hwaccel qsv \
-hwaccel_output_format qsv \
-i input.mp4 \
-vf "vpp_qsv=w=1280:h=720" \
-c:v h264_qsv \
-preset medium \
output.mp4
```
### VVC/H.266 QSV Decoding (FFmpeg 7.1+)
```bash
# Hardware VVC decoding
ffmpeg -hwaccel qsv \
-hwaccel_output_format qsv \
-c:v vvc_qsv \
-i input.vvc \
-c:v h264_qsv \
output.mp4
```
## AMD AMF
### Requirements
- AMD GPU (GCN or newer)
- AMD drivers with AMF support
- FFmpeg built with `--enable-amf`
### Check AMF Support
```bash
ffmpeg -encoders | grep amf
```
### Basic AMF Encoding
```bash
# H.264 AMF
ffmpeg -i input.mp4 \
-c:v h264_amf \
-quality balanced \
-b:v 5M \
output.mp4
# H.265/HEVC AMF
ffmpeg -i input.mp4 \
-c:v hevc_amf \
-quality balanced \
-b:v 4M \
output.mp4
# AV1 AMF (RDNA3+)
ffmpeg -i input.mp4 \
-c:v av1_amf \
-quality balanced \
-b:v 3M \
output.mp4
```
### AMF Quality Presets
| Preset | Description |
|--------|-------------|
| speed | Fastest encoding |
| balanced | **Recommended** |
| quality | Best quality |
### AMD Hardware Upscaling (FFmpeg 8.0+)
```bash
# Super resolution upscaling
ffmpeg -i input.mp4 \
-vf "sr_amf=4096:2160:algorithm=sr1-1" \
-c:v hevc_amf \
output.mp4
```
## VAAPI (Linux)
### Requirements
- Intel, AMD, or NVIDIA GPU on Linux
- VAAPI drivers (intel-media-driver, mesa-va-drivers)
- FFmpeg built with `--enable-vaapi`
### Check VAAPI Support
```bash
# Check VAAPI info
vainfo
# List VAAPI codecs
ffmpeg -encoders | grep vaapi
ffmpeg -decoders | grep vaapi
```
### Basic VAAPI Encoding
```bash
# H.264 VAAPI
ffmpeg -vaapi_device /dev/dri/renderD128 \
-i input.mp4 \
-vf 'format=nv12,hwupload' \
-c:v h264_vaapi \
-b:v 5M \
output.mp4
# H.265/HEVC VAAPI
ffmpeg -vaapi_device /dev/dri/renderD128 \
-i input.mp4 \
-vf 'format=nv12,hwupload' \
-c:v hevc_vaapi \
-b:v 4M \
output.mp4
```
### Full VAAPI Pipeline
```bash
# Decode and encode on GPU
ffmpeg -hwaccel vaapi \
-hwaccel_device /dev/dri/renderD128 \
-hwaccel_output_format vaapi \
-i input.mp4 \
-c:v h264_vaapi \
-b:v 5M \
output.mp4
```
### VAAPI Scaling
```bash
ffmpeg -hwaccel vaapi \
-hwaccel_device /dev/dri/renderD128 \
-hwaccel_output_format vaapi \
-i input.mp4 \
-vf 'scale_vaapi=w=1280:h=720' \
-c:v h264_vaapi \
output.mp4
```
### VVC VAAPI Decoding (FFmpeg 8.0+)
FFmpeg 8.0 adds VVC/H.266 hardware decoding on Intel and AMD GPUs via VAAPI.
```bash
# Hardware VVC/H.266 decoding
ffmpeg -hwaccel vaapi \
-hwaccel_device /dev/dri/renderD128 \
-hwaccel_output_format vaapi \
-i input.vvc \
-c:v h264_vaapi \
output.mp4
# VVC decode + transcode to H.265
ffmpeg -hwaccel vaapi \
-hwaccel_device /dev/dri/renderD128 \
-hwaccel_output_format vaapi \
-i input.mkv \
-c:v hevc_vaapi \
-b:v 4M \
output.mp4
# VVC with Screen Content Coding (SCC) support
# FFmpeg 8.0 adds full SCC support including:
# - IBC (Inter Block Copy)
# - Palette Mode
# - ACT (Adaptive Color Transform)
ffmpeg -hwaccel vaapi \
-hwaccel_device /dev/dri/renderD128 \
-i screen_recording.vvc \
output.mp4
```
**VVC VAAPI Requirements:**
- Intel Xe2 graphics (Lunar Lake) or newer for full VVC support
- FFmpeg 8.0 or later
- Intel media driver with VVC support
## Apple VideoToolbox
### Requirements
- macOS 10.8+ or iOS 8+
- FFmpeg built with `--enable-videotoolbox`
### Check VideoToolbox Support
```bash
ffmpeg -encoders | grep videotoolbox
ffmpeg -decoders | grep videotoolbox
```
### Basic VideoToolbox Encoding
```bash
# H.264 VideoToolbox
ffmpeg -i input.mp4 \
-c:v h264_videotoolbox \
-b:v 5M \
output.mp4
# H.265/HEVC VideoToolbox
ffmpeg -i input.mp4 \
-c:v hevc_videotoolbox \
-b:v 4M \
-tag:v hvc1 \
output.mp4
# ProRes VideoToolbox
ffmpeg -i input.mp4 \
-c:v prores_videotoolbox \
-profile:v 3 \
output.mov
```
### VideoToolbox Quality Settings
```bash
# Quality-based encoding
ffmpeg -i input.mp4 \
-c:v h264_videotoolbox \
-q:v 65 \
output.mp4
# Hardware accelerated decode + encode
ffmpeg -hwaccel videotoolbox \
-i input.mp4 \
-c:v h264_videotoolbox \
-b:v 5M \
output.mp4
```
## Vulkan Video (FFmpeg 7.1+/8.0)
### Overview
Vulkan Video provides cross-platform GPU acceleration using Vulkan compute shaders. Unlike proprietary hardware accelerators, Vulkan codecs are based on compute shaders and work on any implementation of Vulkan 1.3.
### FFmpeg 7.1 Vulkan Features
- H.264 Vulkan encoding
- H.265/HEVC Vulkan encoding
### FFmpeg 8.0 Vulkan Features (New)
- **AV1 Vulkan encoding** - GPU-accelerated AV1 via compute shaders
- **VP9 Vulkan decoding** - Hardware-accelerated VP9 decode
- **FFv1 Vulkan encode/decode** - Lossless codec for archival/capture
- **ProRes RAW Vulkan decode** - Apple ProRes RAW hardware decode
**Benefits of Vulkan Compute Codecs:**
- Cross-platform: Same code works on AMD, Intel, and NVIDIA
- No vendor lock-in: Works with any Vulkan 1.3 driver
- Ideal for: Lossless screen capture, high-throughput archival, professional workflows
### Basic Vulkan Encoding
```bash
# H.264 Vulkan
ffmpeg -init_hw_device vulkan \
-i input.mp4 \
-c:v h264_vulkan \
-b:v 5M \
output.mp4
# H.265/HEVC Vulkan
ffmpeg -init_hw_device vulkan \
-i input.mp4 \
-c:v hevc_vulkan \
-b:v 4M \
output.mp4
# AV1 Vulkan (FFmpeg 8.0+)
ffmpeg -init_hw_device vulkan \
-i input.mp4 \
-c:v av1_vulkan \
-b:v 3M \
output.mp4
# FFv1 Vulkan Lossless (FFmpeg 8.0+)
ffmpeg -init_hw_device vulkan \
-i input.mp4 \
-c:v ffv1_vulkan \
output.mkv
```
### Full Vulkan Pipeline
```bash
# Complete Vulkan decode-filter-encode
ffmpeg -init_hw_device vulkan=vk \
-filter_hw_device vk \
-hwaccel vulkan \
-hwaccel_output_format vulkan \
-i input.mp4 \
-vf "scale_vulkan=1280:720" \
-c:v h264_vulkan \
output.mp4
```
### VP9 Vulkan Decoding (FFmpeg 8.0+)
```bash
# Hardware decode VP9 with Vulkan
ffmpeg -init_hw_device vulkan \
-hwaccel vulkan \
-hwaccel_output_format vulkan \
-i input.webm \
-c:v h264_vulkan \
output.mp4
```
### ProRes RAW Vulkan Decoding (FFmpeg 8.0+)
```bash
# Hardware decode ProRes RAW with Vulkan
ffmpeg -init_hw_device vulkan \
-hwaccel vulkan \
-i input.mov \
-c:v libx264 \
output.mp4
```
### Lossless Screen Capture with FFv1 Vulkan
```bash
# High-throughput lossless capture (Linux X11)
ffmpeg -init_hw_device vulkan \
-f x11grab -framerate 60 -i :0.0 \
-c:v ffv1_vulkan \
screen_capture.mkv
# Lossless screen recording (Windows)
ffmpeg -init_hw_device vulkan \
-f gdigrab -framerate 60 -i desktop \
-c:v ffv1_vulkan \
screen_capture.mkv
```
### Upcoming Vulkan Codecs
The next minor update will add:
- ProRes (encode and decode)
- VC-2 (encode and decode)
## Vulkan Filters (FFmpeg 8.0+)
Vulkan filters provide cross-platform GPU acceleration for video processing.
### Vulkan Filter Reference
| Filter | Description | Key Parameters |
|--------|-------------|----------------|
| `scale_vulkan` | GPU scaling | `w`, `h`, `scaler` |
| `overlay_vulkan` | Compositing | `x`, `y` |
| `transpose_vulkan` | Rotate/transpose | `dir`, `passthrough` |
| `flip_vulkan` | Horizontal flip | - |
| `avgblur_vulkan` | Box blur | `sizeX`, `sizeY` |
| `gblur_vulkan` | Gaussian blur | `sigma`, `sigmaV`, `planes` |
| `chromaber_vulkan` | Chromatic aberration | `dist_x`, `dist_y` |
| `nlmeans_vulkan` | NLMeans denoising | `s`, `p`, `r` |
| `bwdif_vulkan` | Deinterlacing | `mode`, `parity`, `deint` |
| `xfade_vulkan` | Transitions | `transition`, `duration`, `offset` |
| `libplacebo` | Libplacebo processing | `w`, `h`, `colorspace` |
### Vulkan Scaling
```bash
# Basic Vulkan scaling
ffmpeg -init_hw_device vulkan=vk \
-filter_hw_device vk \
-i input.mp4 \
-vf "hwupload,scale_vulkan=1920:1080,hwdownload,format=yuv420p" \
output.mp4
# Full Vulkan pipeline with scaling
ffmpeg -init_hw_device vulkan=vk \
-filter_hw_device vk \
-hwaccel vulkan -hwaccel_output_format vulkan \
-i input.mp4 \
-vf "scale_vulkan=1280:720" \
-c:v h264_vulkan output.mp4
# Vulkan scaling with specific scaler
ffmpeg -init_hw_device vulkan \
-i input.mp4 \
-vf "hwupload,scale_vulkan=w=1920:h=1080:scaler=lanczos,hwdownload,format=yuv420p" \
output.mp4
```
### Vulkan Compositing (overlay_vulkan)
```bash
# Picture-in-picture with Vulkan
ffmpeg -init_hw_device vulkan=vk \
-filter_hw_device vk \
-i main.mp4 -i overlay.mp4 \
-filter_complex "\
[0:v]hwupload[main];\
[1:v]hwupload,scale_vulkan=320:180[pip];\
[main][pip]overlay_vulkan=x=10:y=10[out];\
[out]hwdownload,format=yuv420p" \
-map "[out]" output.mp4
# Full Vulkan overlay pipeline
ffmpeg -init_hw_device vulkan=vk \
-filter_hw_device vk \
-hwaccel vulkan -hwaccel_output_format vulkan -i main.mp4 \
-hwaccel vulkan -hwaccel_output_format vulkan -i overlay.mp4 \
-filter_complex "[0:v][1:v]overlay_vulkan=x=W-w-10:y=10" \
-c:v h264_vulkan output.mp4
```
### Vulkan Rotation and Transpose
```bash
# Transpose (rotate 90° clockwise)
ffmpeg -init_hw_device vulkan \
-i input.mp4 \
-vf "hwupload,transpose_vulkan=dir=clock,hwdownload,format=yuv420p" \
output.mp4
# Horizontal flip
ffmpeg -init_hw_device vulkan \
-i input.mp4 \
-vf "hwupload,flip_vulkan,hwdownload,format=yuv420p" \
output.mp4
```
**transpose_vulkan directions:**
| dir | Rotation |
|-----|----------|
| `cclock_flip` | 90° counter-clockwise + vertical flip |
| `clock` | 90° clockwise |
| `cclock` | 90° counter-clockwise |
| `clock_flip` | 90° clockwise + vertical flip |
### Vulkan Blur Effects
```bash
# Gaussian blur
ffmpeg -init_hw_device vulkan \
-i input.mp4 \
-vf "hwupload,gblur_vulkan=sigma=5,hwdownload,format=yuv420p" \
output.mp4
# Box blur (averaging)
ffmpeg -init_hw_device vulkan \
-i input.mp4 \
-vf "hwupload,avgblur_vulkan=sizeX=5:sizeY=5,hwdownload,format=yuv420p" \
output.mp4
```
### Vulkan Chromatic Aberration
```bash
# Add chromatic aberration effect
ffmpeg -init_hw_device vulkan \
-i input.mp4 \
-vf "hwupload,chromaber_vulkan=dist_x=-0.002:dist_y=0.002,hwdownload,format=yuv420p" \
output.mp4
```
### Vulkan Denoising (nlmeans_vulkan)
```bash
# Non-local means denoising on GPU
ffmpeg -init_hw_device vulkan \
-i noisy.mp4 \
-vf "hwupload,nlmeans_vulkan=s=3:p=7:r=15,hwdownload,format=yuv420p" \
output.mp4
# Full Vulkan denoising pipeline
ffmpeg -init_hw_device vulkan=vk \
-filter_hw_device vk \
-hwaccel vulkan -hwaccel_output_format vulkan \
-i noisy.mp4 \
-vf "nlmeans_vulkan=s=4" \
-c:v h264_vulkan output.mp4
```
### Vulkan Deinterlacing (bwdif_vulkan)
```bash
# Bob-Weaver deinterlacing
ffmpeg -init_hw_device vulkan \
-i interlaced.mp4 \
-vf "hwupload,bwdif_vulkan=mode=send_frame,hwdownload,format=yuv420p" \
output.mp4
# Full Vulkan deinterlace pipeline
ffmpeg -init_hw_device vulkan=vk \
-filter_hw_device vk \
-hwaccel vulkan -hwaccel_output_format vulkan \
-i interlaced.mp4 \
-vf "bwdif_vulkan=1" \
-c:v h264_vulkan output.mp4
```
### Vulkan Transitions (xfade_vulkan)
```bash
# GPU-accelerated crossfade between two clips
ffmpeg -init_hw_device vulkan=vk \
-filter_hw_device vk \
-i clip1.mp4 -i clip2.mp4 \
-filter_complex "\
[0:v]hwupload[v1];\
[1:v]hwupload[v2];\
[v1][v2]xfade_vulkan=transition=fade:duration=1:offset=4[out];\
[out]hwdownload,format=yuv420p" \
-map "[out]" output.mp4
```
**xfade_vulkan transitions:** `fade`, `dissolve`, `wipeleft`, `wiperight`, `wipeup`, `wipedown`, `slideleft`, `slideright`, `slideup`, `slidedown`, and more.
### Libplacebo Processing (Advanced)
Libplacebo provides advanced color management and processing:
```bash
# HDR tonemapping with libplacebo
ffmpeg -init_hw_device vulkan \
-i hdr_input.mp4 \
-vf "hwupload,libplacebo=tonemapping=hable:colorspace=bt709:color_primaries=bt709:color_trc=bt709,hwdownload,format=yuv420p" \
sdr_output.mp4
# Scaling with libplacebo (high quality)
ffmpeg -init_hw_device vulkan \
-i input.mp4 \
-vf "hwupload,libplacebo=w=3840:h=2160:upscaler=ewa_lanczos,hwdownload,format=yuv420p10le" \
output.mp4
```
## OpenCL Filters
OpenCL provides GPU acceleration that works across vendors (NVIDIA, AMD, Intel).
### OpenCL Filter Reference
| Filter | Description | Key Parameters |
|--------|-------------|----------------|
| `scale_opencl` | GPU scaling | `w`, `h` |
| `overlay_opencl` | Compositing | `x`, `y` |
| `pad_opencl` | Padding | `w`, `h`, `x`, `y`, `color` |
| `colorkey_opencl` | Chroma keying | `color`, `similarity`, `blend` |
| `unsharp_opencl` | Sharpening | `lx`, `ly`, `la`, `cx`, `cy`, `ca` |
| `nlmeans_opencl` | NLMeans denoising | `s`, `p`, `r` |
| `deshake_opencl` | Stabilization | `tripod`, `smooth`, `filename` |
| `tonemap_opencl` | HDR tonemapping | `tonemap`, `transfer`, `matrix` |
| `avgblur_opencl` | Box blur | `sizeX`, `sizeY` |
| `convolution_opencl` | Custom kernel | `0m`, `1m`, etc. |
### OpenCL Initialization
```bash
# List OpenCL devices
ffmpeg -init_hw_device opencl=gpu:0.0 -filter_hw_device gpu -f lavfi -i nullsrc -vf "hwupload,avgblur_opencl=2" -t 1 -f null -
# Basic OpenCL filter pattern
ffmpeg -init_hw_device opencl=gpu \
-filter_hw_device gpu \
-i input.mp4 \
-vf "hwupload,<filter_opencl>,hwdownload,format=yuv420p" \
output.mp4
```
### OpenCL Scaling
```bash
# GPU scaling with OpenCL
ffmpeg -init_hw_device opencl=gpu \
-filter_hw_device gpu \
-i input.mp4 \
-vf "hwupload,scale_opencl=1920:1080,hwdownload,format=yuv420p" \
output.mp4
```
### OpenCL Compositing
```bash
# Picture-in-picture with OpenCL
ffmpeg -init_hw_device opencl=gpu \
-filter_hw_device gpu \
-i main.mp4 -i overlay.mp4 \
-filter_complex "\
[0:v]hwupload[main];\
[1:v]hwupload,scale_opencl=320:180[pip];\
[main][pip]overlay_opencl=x=10:y=10[out];\
[out]hwdownload,format=yuv420p" \
-map "[out]" output.mp4
```
### OpenCL Chroma Keying (colorkey_opencl)
```bash
# Green screen removal with OpenCL
ffmpeg -init_hw_device opencl=gpu \
-filter_hw_device gpu \
-i greenscreen.mp4 -i background.mp4 \
-filter_complex "\
[0:v]hwupload,colorkey_opencl=0x00FF00:0.3:0.1[fg];\
[1:v]hwupload[bg];\
[bg][fg]overlay_opencl[out];\
[out]hwdownload,format=yuv420p" \
-map "[out]" output.mp4
```
### OpenCL Denoising (nlmeans_opencl)
```bash
# Non-local means denoising on GPU
ffmpeg -init_hw_device opencl=gpu \
-filter_hw_device gpu \
-i noisy.mp4 \
-vf "hwupload,nlmeans_opencl=s=3:p=7:r=15,hwdownload,format=yuv420p" \
output.mp4
# Strong denoising
ffmpeg -init_hw_device opencl=gpu \
-filter_hw_device gpu \
-i noisy.mp4 \
-vf "hwupload,nlmeans_opencl=s=5:p=7:r=21,hwdownload,format=yuv420p" \
output.mp4
```
### OpenCL Stabilization (deshake_opencl)
```bash
# GPU-accelerated video stabilization
ffmpeg -init_hw_device opencl=gpu \
-filter_hw_device gpu \
-i shaky.mp4 \
-vf "hwupload,deshake_opencl,hwdownload,format=yuv420p" \
output.mp4
# Tripod mode (camera stays in one place)
ffmpeg -init_hw_device opencl=gpu \
-filter_hw_device gpu \
-i shaky.mp4 \
-vf "hwupload,deshake_opencl=tripod=1,hwdownload,format=yuv420p" \
output.mp4
```
### OpenCL Sharpening (unsharp_opencl)
```bash
# Basic sharpening
ffmpeg -init_hw_device opencl=gpu \
-filter_hw_device gpu \
-i input.mp4 \
-vf "hwupload,unsharp_opencl=lx=5:ly=5:la=1.0,hwdownload,format=yuv420p" \
output.mp4
# Subtle sharpening for HD video
ffmpeg -init_hw_device opencl=gpu \
-filter_hw_device gpu \
-i input.mp4 \
-vf "hwupload,unsharp_opencl=lx=3:ly=3:la=0.5,hwdownload,format=yuv420p" \
output.mp4
```
### OpenCL HDR Tonemapping (tonemap_opencl)
```bash
# HDR to SDR conversion with GPU
ffmpeg -init_hw_device opencl=gpu \
-filter_hw_device gpu \
-i hdr_input.mp4 \
-vf "hwupload,tonemap_opencl=tonemap=hable:transfer=bt709:matrix=bt709:primaries=bt709,hwdownload,format=yuv420p" \
sdr_output.mp4
# Reinhard tonemapping
ffmpeg -init_hw_device opencl=gpu \
-filter_hw_device gpu \
-i hdr_input.mp4 \
-vf "hwupload,tonemap_opencl=tonemap=reinhard:param=0.5,hwdownload,format=yuv420p" \
sdr_output.mp4
```
**tonemap_opencl algorithms:**
| Algorithm | Description |
|-----------|-------------|
| `none` | No tonemapping |
| `clip` | Simple clipping |
| `linear` | Linear scaling |
| `gamma` | Gamma-based |
| `reinhard` | Reinhard operator |
| `hable` | Hable/Filmic (most popular) |
| `mobius` | Mobius (preserves blacks) |
| `bt2390` | ITU-R BT.2390 (broadcast standard) |
### OpenCL Blur
```bash
# Box blur
ffmpeg -init_hw_device opencl=gpu \
-filter_hw_device gpu \
-i input.mp4 \
-vf "hwupload,avgblur_opencl=sizeX=5:sizeY=5,hwdownload,format=yuv420p" \
output.mp4
```
### OpenCL Custom Convolution
```bash
# Edge detection kernel
ffmpeg -init_hw_device opencl=gpu \
-filter_hw_device gpu \
-i input.mp4 \
-vf "hwupload,convolution_opencl=0m='-1 -1 -1 -1 8 -1 -1 -1 -1':0rdiv=1:0bias=128,hwdownload,format=yuv420p" \
output.mp4
# Sharpen kernel
ffmpeg -init_hw_device opencl=gpu \
-filter_hw_device gpu \
-i input.mp4 \
-vf "hwupload,convolution_opencl=0m='0 -1 0 -1 5 -1 0 -1 0':0rdiv=1:0bias=0,hwdownload,format=yuv420p" \
output.mp4
```
## GPU Filter Comparison
### Cross-Platform Filter Availability
| Filter Type | CUDA | Vulkan | OpenCL | VAAPI | QSV |
|-------------|------|--------|--------|-------|-----|
| Scaling | ✅ `scale_cuda` | ✅ `scale_vulkan` | ✅ `scale_opencl` | ✅ `scale_vaapi` | ✅ `vpp_qsv` |
| Overlay | ✅ `overlay_cuda` | ✅ `overlay_vulkan` | ✅ `overlay_opencl` | - | - |
| Denoising | ✅ `bilateral_cuda` | ✅ `nlmeans_vulkan` | ✅ `nlmeans_opencl` | - | - |
| Deinterlace | ✅ `bwdif_cuda` | ✅ `bwdif_vulkan` | - | ✅ `deinterlace_vaapi` | ✅ `vpp_qsv` |
| Chromakey | ✅ `chromakey_cuda` | - | ✅ `colorkey_opencl` | - | - |
| Tonemapping | - | ✅ `libplacebo` | ✅ `tonemap_opencl` | ✅ `tonemap_vaapi` | - |
| Stabilization | - | - | ✅ `deshake_opencl` | - | - |
| Padding | ✅ `pad_cuda` | - | ✅ `pad_opencl` | - | - |
### Performance Comparison
| API | Platform Support | Filter Variety | Performance |
|-----|------------------|----------------|-------------|
| CUDA | NVIDIA only | Excellent | Best on NVIDIA |
| Vulkan | Cross-platform | Good (growing) | Very good |
| OpenCL | Cross-platform | Good | Good |
| VAAPI | Linux only | Limited | Good on Linux |
| QSV | Intel only | Limited | Good on Intel |
## Docker with Hardware Acceleration
### NVIDIA GPU in Docker
```bash
# Run with NVIDIA GPU support
docker run --gpus all \
--rm \
-v $(pwd):/data \
jrottenberg/ffmpeg:nvidia \
-hwaccel cuda \
-hwaccel_output_format cuda \
-i /data/input.mp4 \
-c:v h264_nvenc \
/data/output.mp4
```
### Intel QSV in Docker
```bash
# Run with Intel GPU access
docker run --rm \
--device=/dev/dri:/dev/dri \
-v $(pwd):/data \
jrottenberg/ffmpeg:vaapi \
-hwaccel qsv \
-i /data/input.mp4 \
-c:v h264_qsv \
/data/output.mp4
```
## Troubleshooting
### Common Issues
**"No NVENC capable devices found"**
```bash
# Check GPU support
nvidia-smi
# Verify driver version
nvidia-smi --query-gpu=driver_version --format=csv
# Check CUDA version
nvcc --version
```
**"Cannot load libcuda.so"**
```bash
# Set library path (Linux)
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
```
**QSV "Unsupported format"**
```bash
# Force pixel format conversion
ffmpeg -i input.mp4 \
-vf "format=nv12" \
-c:v h264_qsv \
output.mp4
```
**VAAPI permission denied**
```bash
# Add user to video/render group
sudo usermod -aG video $USER
sudo usermod -aG render $USER
# Re-login or use newgrp
```
### Performance Debugging
```bash
# Benchmark encode speed
ffmpeg -benchmark -i input.mp4 -c:v h264_nvenc -f null -
# Show hardware decode stats
ffmpeg -hwaccel cuda -hwaccel_output_format cuda -benchmark -i input.mp4 -f null -
# Monitor GPU usage (NVIDIA)
nvidia-smi dmon -s u
# Monitor GPU usage (Intel)
intel_gpu_top
```
## Best Practices
1. **Use full GPU pipelines** when possible to avoid CPU-GPU memory transfers
2. **Match decode and encode hardware** for best performance
3. **Use appropriate presets** - faster isn't always better for quality
4. **Enable lookahead and AQ** for quality-critical encodes
5. **Test on target hardware** - quality varies by GPU generation
6. **Monitor GPU memory** for high-resolution content
7. **Consider power efficiency** for laptops and servers
8. **Update drivers regularly** for performance and feature improvements
## Recommended Settings by Use Case
### Live Streaming
```bash
ffmpeg -hwaccel cuda -hwaccel_output_format cuda -i input \
-c:v h264_nvenc -preset p3 -tune ll -zerolatency 1 -b:v 6M \
-f flv rtmp://server/live/stream
```
### VOD Encoding (Quality)
```bash
ffmpeg -i input.mp4 \
-c:v hevc_nvenc -preset p6 -tune hq \
-rc vbr -cq 22 -b:v 0 \
-rc-lookahead 32 -spatial-aq 1 \
output.mp4
```
### Batch Processing
```bash
# Multiple parallel streams on one GPU
ffmpeg -hwaccel cuda -hwaccel_output_format cuda -i input1.mp4 -c:v h264_nvenc output1.mp4 &
ffmpeg -hwaccel cuda -hwaccel_output_format cuda -i input2.mp4 -c:v h264_nvenc output2.mp4 &
wait
```
This guide covers hardware acceleration fundamentals. For specific platform optimizations and advanced configurations, consult the respective vendor documentation.