Pandoc Document Conversion Skill
Convert documents between formats using pandoc, the universal document converter.
Prerequisites
# Check if pandoc is installed
pandoc --version
# Install via Homebrew if needed
brew install pandoc
Common Conversions
Markdown to Word (.docx)
# Basic conversion
pandoc input.md -o output.docx
# With table of contents
pandoc input.md --toc -o output.docx
# With custom reference doc (for styling)
pandoc input.md --reference-doc=template.docx -o output.docx
# Standalone with metadata
pandoc input.md -s --metadata title="Document Title" -o output.docx
Markdown to PDF
# Requires LaTeX - install one of:
# brew install --cask basictex # Smaller (~100MB)
# brew install --cask mactex-no-gui # Full (~4GB)
# After install: eval "$(/usr/libexec/path_helper)" or new terminal
# Basic conversion (uses pdflatex)
pandoc input.md -o output.pdf
# With table of contents and custom margins
pandoc input.md -s --toc --toc-depth=2 -V geometry:margin=1in -o output.pdf
# Using xelatex (better Unicode support - box drawings, arrows, etc.)
export PATH="/Library/TeX/texbin:$PATH"
pandoc input.md --pdf-engine=xelatex -V geometry:margin=1in -o output.pdf
PDF Engine Selection:
| Engine | Use When |
|---|---|
pdflatex | Default, ASCII content only |
xelatex | Unicode characters (arrows, box-drawing, emojis) |
lualatex | Complex typography, OpenType fonts |
Markdown to HTML
# Basic HTML
pandoc input.md -o output.html
# Standalone HTML with CSS
pandoc input.md -s -c style.css -o output.html
# Self-contained (embeds images/CSS) - note: --self-contained is deprecated
pandoc input.md -s --embed-resources --standalone -o output.html
HTML for Print-to-PDF (No LaTeX Required)
When LaTeX isn't available, create styled HTML and print to PDF from browser:
# Create inline CSS file
cat > /tmp/print-style.css << 'EOF'
body { font-family: -apple-system, BlinkMacSystemFont, "Segoe UI", Roboto, sans-serif;
max-width: 800px; margin: 0 auto; padding: 2em; line-height: 1.6; }
h1 { border-bottom: 2px solid #333; padding-bottom: 0.3em; }
h2 { border-bottom: 1px solid #ccc; padding-bottom: 0.2em; margin-top: 1.5em; }
table { border-collapse: collapse; width: 100%; margin: 1em 0; }
th, td { border: 1px solid #ddd; padding: 8px; text-align: left; }
th { background-color: #f5f5f5; }
code { background-color: #f4f4f4; padding: 2px 6px; border-radius: 3px; }
pre { background-color: #f4f4f4; padding: 1em; overflow-x: auto; border-radius: 5px; }
blockquote { border-left: 4px solid #ddd; margin: 1em 0; padding-left: 1em; color: #666; }
@media print { body { max-width: none; } }
EOF
# Convert with embedded styles
pandoc input.md -s --toc --toc-depth=2 -c /tmp/print-style.css --embed-resources --standalone -o output.html
# Open and print to PDF (Cmd+P > Save as PDF)
open output.html
Word to Markdown
# Extract markdown from docx
pandoc input.docx -o output.md
# With ATX-style headers
pandoc input.docx --atx-headers -o output.md
Useful Options
| Option | Description |
|---|---|
-s / --standalone | Produce standalone document with header/footer |
--toc | Generate table of contents |
--toc-depth=N | TOC depth (default: 3) |
-V key=value | Set template variable |
--metadata key=value | Set metadata field |
--reference-doc=FILE | Use FILE for styling (docx/odt) |
--template=FILE | Use custom template |
--highlight-style=STYLE | Syntax highlighting (pygments, tango, etc.) |
--number-sections | Number section headings |
-f FORMAT | Input format (if not auto-detected) |
-t FORMAT | Output format (if not auto-detected) |
Format Identifiers
| Format | Identifier |
|---|---|
| Markdown | markdown, gfm (GitHub), commonmark |
| Word | docx |
pdf | |
| HTML | html, html5 |
| LaTeX | latex |
| RST | rst |
| EPUB | epub |
| ODT | odt |
| RTF | rtf |
Google Docs Workflow
To get markdown into Google Docs with formatting preserved:
# 1. Convert to docx
pandoc document.md -o document.docx
# 2. Upload to Google Drive
# 3. Right-click > Open with > Google Docs
Google Docs imports .docx files well and preserves:
- Headings
- Bold/italic
- Lists (bulleted and numbered)
- Tables
- Links
- Code blocks (as monospace)
PSI Document Conversion
For PSI documents with tables and complex formatting:
# Convert PSI markdown to Word
pandoc PSI-document.md \
--standalone \
--toc \
--toc-depth=2 \
-o PSI-document.docx
# Open for review
open PSI-document.docx
Troubleshooting
Tables Not Rendering
Pandoc requires proper markdown table syntax:
| Header 1 | Header 2 |
|----------|----------|
| Cell 1 | Cell 2 |
Code Blocks Missing Highlighting
Use fenced code blocks with language identifier:
```python
def example():
pass
### PDF Generation Fails
**"pdflatex not found"** - Install LaTeX:
```bash
# Smaller option (~100MB)
brew install --cask basictex
# Full option (~4GB)
brew install --cask mactex-no-gui
# After install, update PATH
eval "$(/usr/libexec/path_helper)"
# Or open a new terminal
Unicode character errors (box-drawing, arrows, emojis):
# Use xelatex instead of pdflatex
export PATH="/Library/TeX/texbin:$PATH"
pandoc input.md --pdf-engine=xelatex -o output.pdf
No LaTeX available - Use HTML print-to-PDF workflow:
pandoc input.md -s --toc -o output.html
open output.html
# Then Cmd+P > Save as PDF
Self-Test
# Verify pandoc installation
pandoc --version | head -1
# Test basic conversion
echo "# Test\n\nHello **world**" | pandoc -f markdown -t html
