Document Creation

Core Principle

Documents are deliverables. Match the format to the audience: Word for collaborative editing, PDF for final distribution, PowerPoint for presentations, Excel for data and models. Every document should look like it was crafted by a professional, not generated by a script.

Format Selection

Need	Format	Why
Collaborative editing, track changes	`.docx` (Word)	Native revision tracking, comments, universal compatibility
Final distribution, no editing	`.pdf`	Preserves formatting, works everywhere, print-ready
Presentations, pitch decks	`.pptx` (PowerPoint)	Slide-based, speaker notes, animations
Data, calculations, models	`.xlsx` (Excel)	Formulas, charts, pivot tables, dynamic recalculation

Word Documents (.docx)

Reading Content

# Text extraction with pandoc
pandoc document.docx -o output.md

# With tracked changes preserved
pandoc --track-changes=all document.docx -o output.md

Creating New Documents

Use the docx npm package (JavaScript) for creating Word documents programmatically.

npm install -g docx

const { Document, Packer, Paragraph, TextRun, HeadingLevel } = require('docx');
const fs = require('fs');

const doc = new Document({
  sections: [{
    properties: {
      page: {
        size: { width: 12240, height: 15840 },  // US Letter in DXA
        margin: { top: 1440, right: 1440, bottom: 1440, left: 1440 }
      }
    },
    children: [
      new Paragraph({
        heading: HeadingLevel.HEADING_1,
        children: [new TextRun("Document Title")]
      }),
      new Paragraph({
        children: [new TextRun("Body text goes here.")]
      }),
    ]
  }]
});

Packer.toBuffer(doc).then(buffer => fs.writeFileSync("output.docx", buffer));

Critical Rules for Word Documents

- Set page size explicitly (defaults to A4, not US Letter)
- Never use \n for line breaks — use separate Paragraph elements
- Never use unicode bullets — use LevelFormat.BULLET with numbering config
- Tables need dual widths: columnWidths on table AND width on each cell
- Always use WidthType.DXA, never PERCENTAGE (breaks in Google Docs)
- ImageRun requires the 'type' parameter (png, jpg, etc.)
- Override built-in heading styles with exact IDs: "Heading1", "Heading2"

PDF Documents

Reading PDFs

# Text extraction
import pdfplumber

with pdfplumber.open("document.pdf") as pdf:
    for page in pdf.pages:
        print(page.extract_text())

# Table extraction
with pdfplumber.open("document.pdf") as pdf:
    for page in pdf.pages:
        tables = page.extract_tables()
        for table in tables:
            for row in table:
                print(row)

Creating PDFs

from reportlab.lib.pagesizes import letter
from reportlab.platypus import SimpleDocTemplate, Paragraph, Spacer
from reportlab.lib.styles import getSampleStyleSheet

doc = SimpleDocTemplate("report.pdf", pagesize=letter)
styles = getSampleStyleSheet()
story = []

story.append(Paragraph("Report Title", styles['Title']))
story.append(Spacer(1, 12))
story.append(Paragraph("Body text content.", styles['Normal']))

doc.build(story)

PDF Operations

from pypdf import PdfReader, PdfWriter

# Merge PDFs
writer = PdfWriter()
for pdf_file in ["doc1.pdf", "doc2.pdf"]:
    reader = PdfReader(pdf_file)
    for page in reader.pages:
        writer.add_page(page)

with open("merged.pdf", "wb") as output:
    writer.write(output)

Critical Rules for PDFs

- Never use Unicode subscript/superscript characters in ReportLab
  (renders as black boxes) — use <sub> and <super> tags instead
- Use pdfplumber for reading, reportlab for creating, pypdf for manipulation
- For scanned PDFs, use pytesseract OCR (convert to images first)

PowerPoint Presentations (.pptx)

Reading Content

# Text extraction
python -m markitdown presentation.pptx

Design Principles

BEFORE STARTING:
  → Pick a bold, content-informed color palette
  → One color dominates (60-70%), 1-2 supporting, one accent
  → Dark backgrounds for title/conclusion, light for content
  → Commit to ONE visual motif and repeat it across slides

EVERY SLIDE NEEDS:
  → A visual element — image, chart, icon, or shape
  → Text-only slides are forgettable
  → Vary layouts: two-column, icon rows, grids, half-bleed images

TYPOGRAPHY:
  → Choose a header font with personality + clean body font
  → Titles: 36-44pt bold | Body: 14-16pt | Captions: 10-12pt
  → Never default to Arial — use Georgia, Cambria, Trebuchet MS

AVOID:
  → Same layout on every slide
  → Centered body text (left-align paragraphs and lists)
  → Accent lines under titles (hallmark of AI-generated slides)
  → Text-only slides with plain bullets

Creating from Scratch

Use pptxgenjs for programmatic creation:

npm install -g pptxgenjs

Excel Spreadsheets (.xlsx)

Reading Data

import pandas as pd

df = pd.read_excel('file.xlsx')            # First sheet
all_sheets = pd.read_excel('file.xlsx', sheet_name=None)  # All sheets

Creating Spreadsheets

from openpyxl import Workbook
from openpyxl.styles import Font, PatternFill, Alignment

wb = Workbook()
sheet = wb.active

# Add data
sheet['A1'] = 'Revenue'
sheet['B1'] = 'Q1'
sheet.append(['Product A', 50000])

# ALWAYS use formulas, not hardcoded calculations
sheet['B10'] = '=SUM(B2:B9)'

# Formatting
sheet['A1'].font = Font(bold=True)
sheet.column_dimensions['A'].width = 20

wb.save('output.xlsx')

Critical Rules for Excel

- ALWAYS use Excel formulas, never hardcode calculated values
  BAD:  sheet['B10'] = 5000  (calculated in Python)
  GOOD: sheet['B10'] = '=SUM(B2:B9)'

- Financial models: blue text for inputs, black for formulas,
  green for cross-sheet links, red for external links

- Number formatting:
  → Currency: $#,##0 with units in headers
  → Percentages: 0.0% (one decimal)
  → Negatives: parentheses (123) not -123
  → Years: text format ("2024" not "2,024")

- Place all assumptions in separate cells, never inline in formulas
- Document data sources for hardcoded values
- Recalculate after creating (formulas stored as strings, not values)

Anti-Patterns

Anti-Pattern	Why It Hurts	Do This Instead
Hardcoding calculations in Python	Spreadsheet becomes static, can't update	Use Excel formulas for all calculations
Generic blue slides with bullets	Looks AI-generated, forgettable	Bold color palette, visual elements, varied layouts
PDFs with Unicode subscripts in ReportLab	Renders as black boxes	Use `<sub>` and `<super>` markup tags
Word tables with percentage widths	Breaks in Google Docs	Always use `WidthType.DXA`
No tracked changes in collaborative docs	Edits are invisible, trust erodes	Use Word's tracked changes for all modifications
Dumping raw data into Excel	Unreadable, no insights	Format with headers, formulas, conditional formatting

Power Move

"Create a [document type] for [purpose]. Use professional formatting, proper typography, and make it look like it was designed by someone who cares about the details. Include [specific content requirements]. Output as [format]."

The agent becomes your document production team — creating polished, professional deliverables in any format.

document-creationSafety 100Repository

Package Files

Document Creation

Core Principle

Format Selection

Word Documents (.docx)

Reading Content

Creating New Documents

Critical Rules for Word Documents

PDF Documents

Reading PDFs

Creating PDFs

PDF Operations

Critical Rules for PDFs

PowerPoint Presentations (.pptx)

Reading Content

Design Principles

Creating from Scratch

Excel Spreadsheets (.xlsx)

Reading Data

Creating Spreadsheets

Critical Rules for Excel

Anti-Patterns

Power Move

Install

AI Quality Score

Metadata

Tags

document-creationSafety 100Repository ShareFavorite skill

Package Files

Document Creation

Core Principle

Format Selection

Word Documents (.docx)

Reading Content

Creating New Documents

Critical Rules for Word Documents

PDF Documents

Reading PDFs

Creating PDFs

PDF Operations

Critical Rules for PDFs

PowerPoint Presentations (.pptx)

Reading Content

Design Principles

Creating from Scratch

Excel Spreadsheets (.xlsx)

Reading Data

Creating Spreadsheets

Critical Rules for Excel

Anti-Patterns

Power Move

Install

AI Quality Score

Metadata

Tags

document-creationSafety 100Repository