askill
youtube-transcriber-pipeline-claude-code-skill

youtube-transcriber-pipeline-claude-code-skillSafety 90Repository

**Version**: 1.0.0 | **Status**: Ready for Claude Code Integration

0 stars
1.2k downloads
Updated 1/12/2026

Package Files

Loading files...
SKILL.md

YouTube Transcriber Pipeline - Claude Code Skill

Version: 1.0.0 | Status: Ready for Claude Code Integration

Overview

Complete 4-skill pipeline for extracting, formatting, organizing, and archiving YouTube transcripts. Integrates with Claude Code CLI using the system's configured API keys (no manual key management needed).

Capabilities

This skill provides a complete workflow:

  1. Extract Facts - AI-powered fact extraction from transcripts (uses Claude API via Claude Code)
  2. Format Notes - Convert facts to atomic markdown notes with wiki-links
  3. Integrate Obsidian - Import notes into Obsidian vault with backlinks
  4. Archive Transcripts - Safe archival with integrity verification

Usage in Claude Code

# Run complete pipeline
/run-youtube-transcriber --transcript ~/Downloads/transcript.txt

# Just extract facts
/extract-transcript-facts ~/Downloads/transcript.txt

# Just format notes
/format-transcript-notes ~/Downloads/notes/ --output ./formatted

# Just integrate Obsidian
/integrate-obsidian-vault ./notes/ --vault-path ~/Documents/Obsidian/Aaron

# Just archive
/archive-transcripts ~/Downloads/transcript.txt

Features

Phase 1: Extract Facts (AI-Powered)

  • Uses Claude 3.5 Sonnet via Claude Code CLI
  • 7 fact categories (scientific, quotes, methods, examples, statistics, warnings, connections)
  • Hierarchical topic organization
  • Confidence scoring (0.0-1.0)
  • Automatic metadata (actionable, philosophical, controversial)
  • No API key needed - Uses Claude Code's configured authentication

Phase 2: Format as Atomic Notes

  • 1 fact = 1 markdown file
  • Wiki-link cross-references [[Topic Name]]
  • YAML front matter with metadata
  • Topic indices and hierarchies
  • Obsidian vault compatibility

Phase 3: Integrate Obsidian

  • Safe vault backup before changes
  • Preserve directory structure
  • Auto-generate backlinks
  • Master index creation
  • Link verification
  • Dry-run mode for testing

Phase 4: Archive Transcripts

  • tar.gz compression
  • MD5 integrity verification
  • Metadata tracking
  • Optional cleanup
  • Verify-only mode

Installation

The skill is ready to use once copied to Claude Code:

# Copy to Claude Code skills directory
cp -r ~/github/astoreyai/claude-skills/ ~/.claude/skills/youtube-transcriber

# Or individual components
cp -r youtube-transcript-extractor ~/.claude/skills/
cp -r transcript-to-logseq ~/.claude/skills/
cp -r transcript-to-obsidian ~/.claude/skills/
cp -r transcript-archiver ~/.claude/skills/

Requirements

For Claude Code CLI

  • Claude Code installed and configured
  • API key already configured in Claude Code (no manual setup needed)
  • Python 3.11+

For Obsidian Integration

  • Obsidian vault at ~/Documents/Obsidian/Aaron/ (configurable)
  • Read/write permissions to vault directory

Architecture

Claude Code CLI
    ↓
Transcript Extractor (Phase 1)
    ├─ Uses Claude Code's API key automatically
    ├─ Extracts facts via Claude API
    └─ Outputs Fact objects
    ↓
Logseq Formatter (Phase 2)
    ├─ Converts facts to atomic notes
    └─ Creates markdown files
    ↓
Obsidian Integrator (Phase 3)
    ├─ Imports notes to vault
    ├─ Generates backlinks
    └─ Creates indices
    ↓
Transcript Archiver (Phase 4)
    ├─ Compresses files
    ├─ Verifies integrity
    └─ Stores archive

Future: Ollama Integration

The architecture is designed to support local LLM models via Ollama:

# Future upgrade (when Ollama integration ready)
/run-youtube-transcriber --transcript ~/Downloads/transcript.txt \
  --llm-backend ollama \
  --model mistral-7b

# Benefits:
# - No API costs
# - Runs completely locally
# - Works offline
# - Full privacy (data never leaves your machine)

Swapping backends will be as simple as changing a configuration parameter.

Testing

All 4 skills are fully tested:

  • Phase 1: 18 tests (extraction logic)
  • Phase 2: 21 tests (formatting, hierarchy)
  • Phase 3: 6 tests (integration, backlinks)
  • Phase 4: 18 tests (archival, verification)
  • Integration: 6 tests (cross-skill compatibility)

Total: 69/69 tests passing (100%)

Performance

  • Fact extraction: 50-100ms per fact (via Claude API)
  • Note formatting: ~500ms per note
  • Obsidian integration: <1s for 100 notes
  • Archival: <2s for 1000 files

Data Flow

Raw Transcript (txt/md)
    ↓
Phase 1: Extract Facts
    → Fact objects with metadata
    ↓
Phase 2: Format Notes
    → Markdown files in ~/output/notes/
    ↓
Phase 3: Integrate Obsidian
    → Updated ~/Documents/Obsidian/Aaron/
    ↓
Phase 4: Archive
    → ~/youtube-transcriber/transcripts-archive/

Example Workflow

1. Extract Facts

/extract-transcript-facts ~/Downloads/ml-lecture.txt

Output:

✓ Transcript loaded (5,234 chars)
✓ Extracting facts...
✓ Facts extracted: 12
✓ Topics identified: 4
✓ Accuracy: 92%

2. Format Notes

/format-transcript-notes ~/Downloads/ml-lecture.txt --output ./notes

Output:

✓ Creating atomic notes...
✓ Notes created: 12
✓ Topics structured: 4
✓ Output: ./notes/

3. Integrate Obsidian

/integrate-obsidian-vault ./notes/ --vault-path ~/Documents/Obsidian/Aaron

Output:

✓ Vault path: ~/Documents/Obsidian/Aaron
✓ Notes integrated: 12
✓ Backlinks created: 8
✓ Index created: Video-Transcripts/index.md

4. Archive

/archive-transcripts ~/Downloads/ml-lecture.txt

Output:

✓ Files archived: 1
✓ Archive: ~/youtube-transcriber/transcripts-archive/transcripts_20251122_143200.tar.gz
✓ Size: 45.2 KB

Configuration

Edit config.yaml in the skill directory:

# Paths
vault_path: "~/Documents/Obsidian/Aaron"
archive_dir: "~/youtube-transcriber/transcripts-archive"

# Extraction
max_facts: null  # No limit
confidence_threshold: 0.75
extract_all_categories: true

# Formatting
notes_per_file: 1  # Atomic notes
add_wiki_links: true
create_indices: true

# Integration
backup_before_integrate: true
auto_generate_backlinks: true
dry_run_mode: false  # Set true to test without changes

# Archival
compression: "gzip"
verify_after_archive: true
cleanup_originals: false

Troubleshooting

API Key Issues (If Running Standalone)

If you run the scripts outside Claude Code and get API key errors:

# Claude Code integration (recommended)
# Just use /extract-transcript-facts - no setup needed

# Or manually set API key for standalone use
export ANTHROPIC_API_KEY='sk-ant-...'
python workflow_runner.py --transcript your_file.txt

Obsidian Vault Not Found

# Create vault directory if needed
mkdir -p ~/Documents/Obsidian/Aaron

# Set custom vault path
/integrate-obsidian-vault ./notes/ --vault-path /your/vault/path

Permission Errors

# Ensure write access to vault and archive directories
chmod -R 755 ~/Documents/Obsidian/Aaron
mkdir -p ~/youtube-transcriber/transcripts-archive
chmod -R 755 ~/youtube-transcriber

Files Included

youtube-transcriber/
├── youtube-transcript-extractor/        Phase 1 Skill
│   ├── src/
│   │   ├── extractor.py
│   │   ├── formatters.py
│   │   └── metadata.py
│   └── tests/ (18 tests)
│
├── transcript-to-logseq/                Phase 2 Skill
│   ├── src/
│   │   ├── logseq_formatter.py
│   │   ├── obsidian_compat.py
│   │   └── hierarchy_builder.py
│   └── tests/ (21 tests)
│
├── transcript-to-obsidian/              Phase 3 Skill
│   ├── src/
│   │   └── integrator.py
│   └── tests/ (6 tests)
│
├── transcript-archiver/                 Phase 4 Skill
│   ├── src/
│   │   └── archiver.py
│   └── tests/ (18 tests)
│
├── workflow_runner.py                   Full pipeline runner
├── integration_test.py                  Cross-skill tests (6 tests)
└── config.yaml                          Configuration

Future Roadmap

v1.1.0 (Ollama Integration)

  • Swap Claude API → Ollama for local LLMs
  • Support multiple local models (Mistral, Llama2, etc)
  • Offline-first workflow

v1.2.0 (Advanced Features)

  • Batch transcript processing
  • Custom fact categories
  • Multiple output formats
  • Web UI for management

v1.3.0 (Integration)

  • Direct YouTube URL support (fetch transcripts automatically)
  • Streaming support for long transcripts
  • Multi-language support

Support

  • Documentation: See README.md in each skill directory
  • Tests: Run python -m pytest tests/ in each directory
  • Issues: Check integration_test.py for validation

License

Same as parent project (astoreyai)


Status: ✅ Ready for Claude Code integration

Works With:

  • ✅ Claude Code CLI (uses configured API key automatically)
  • ✅ Ollama (local models - future version)
  • ✅ Custom LLM backends (extensible architecture)

No API key management needed when used with Claude Code CLI

Install

Download ZIP
Requires askill CLI v1.0+

AI Quality Score

92/100Analyzed 2/9/2026

An exceptionally well-documented skill for a multi-stage YouTube transcript processing pipeline. It features clear usage examples, architecture diagrams, configuration details, and comprehensive troubleshooting steps.

90
95
85
95
95

Metadata

Licenseunknown
Version-
Updated1/12/2026
Publisherastoreyai

Tags

apici-cdgithubgithub-actionslintingllmtesting