askill
docx

docxSafety 92Repository

Creates, edits, and analyzes Word documents with tracked changes, comments, and formatting preservation. Use when working with .docx files for document creation, modification, redlining, or text extraction.

2 stars
1.2k downloads
Updated 2/13/2026

Package Files

Loading files...
SKILL.md

DOCX Creation, Editing, and Analysis

Read the relevant reference file completely before starting work:

  • Creating a new document: read references/docx-js.md
  • Editing an existing document: read references/ooxml.md

Workflow Decision Tree

TaskWorkflowReference
Read/analyse contentText extraction (pandoc) or Raw XMLNone needed
Create new documentdocx-js (JavaScript)references/docx-js.md
Edit your own doc (simple)OOXML editingreferences/ooxml.md
Edit someone else's docRedlining workflow (recommended)references/ooxml.md
Legal/business/governmentRedlining workflow (required)references/ooxml.md

Reading and Analysing Content

Text Extraction (Default)

Convert the document to markdown with pandoc:

pandoc --track-changes=all path-to-file.docx -o output.md
# Options: --track-changes=accept (default) / reject / all

Default to --track-changes=all to preserve revision history. Use accept only when the user wants clean text without markup.

Raw XML Access

Use raw XML when you need: comments, complex formatting, document structure, embedded media, or metadata.

python ooxml/scripts/unpack.py <office_file> <output_directory>

Key files after unpacking:

  • word/document.xml -- main document body
  • word/comments.xml -- comments referenced in document.xml
  • word/media/ -- embedded images and media
  • Tracked changes use <w:ins> (insertions) and <w:del> (deletions) tags

Creating a New Word Document

Use docx-js (JavaScript/TypeScript) for new documents.

  1. Read references/docx-js.md completely
  2. Write a script using Document, Paragraph, TextRun components
  3. Export with Packer.toBuffer()
  4. Verify the output opens in Word/LibreOffice without errors

Action:

  1. Read references/docx-js.md
  2. Create script with Document, Paragraph, TextRun, numbering config for bullets
  3. Run: node memo.js
  4. Verify: soffice --headless --convert-to pdf memo.docx && pdftoppm -jpeg -r 150 memo.pdf preview

Editing an Existing Word Document

Use the Document library (Python) from scripts/document.py. It handles infrastructure setup automatically (people.xml, RSIDs, settings.xml, comments, relationships, content types).

Standard Editing Workflow

  1. Read references/ooxml.md completely (focus on "Document Library" section)
  2. Unpack: python ooxml/scripts/unpack.py <file.docx> <output_dir>
  3. Edit using Document library methods
  4. Pack: python ooxml/scripts/pack.py <output_dir> <result.docx>
  5. Verify: convert to markdown and check output

Action:

from scripts.document import Document
doc = Document('unpacked', track_revisions=True)
node = doc["word/document.xml"].get_node(tag="w:r", contains="30 days")
rpr = tags[0].toxml() if (tags := node.getElementsByTagName("w:rPr")) else ""
replacement = (
    f'<w:r w:rsidR="ORIGINAL">{rpr}<w:t>within </w:t></w:r>'
    f'<w:del><w:r>{rpr}<w:delText>30</w:delText></w:r></w:del>'
    f'<w:ins><w:r>{rpr}<w:t>60</w:t></w:r></w:ins>'
    f'<w:r w:rsidR="ORIGINAL">{rpr}<w:t> days</w:t></w:r>'
)
doc["word/document.xml"].replace_node(node, replacement)
doc.save()

Redlining Workflow (Document Review with Tracked Changes)

Plan tracked changes in markdown before implementing in OOXML. Group related changes into batches of 3-10 for manageable debugging.

Principle: Minimal, Precise Edits. Only mark text that actually changes. Repeating unchanged text makes edits harder to review. Break replacements into: [unchanged text] + [deletion] + [insertion] + [unchanged text]. Preserve the original run's RSID for unchanged text.

Step-by-Step

  1. Get markdown representation:

    pandoc --track-changes=all path-to-file.docx -o current.md
    
  2. Identify and group changes. Organise into batches by section, type, or proximity. Use these location methods for finding text in XML:

    • Section/heading numbers (e.g., "Section 3.2")
    • Grep patterns with unique surrounding text
    • Document structure (e.g., "first paragraph after Heading 2")
    • Do NOT use markdown line numbers -- they do not map to XML structure
  3. Read documentation and unpack:

    • Read references/ooxml.md -- focus on "Document Library" and "Tracked Change Patterns"
    • Unpack: python ooxml/scripts/unpack.py <file.docx> <dir>
    • Note the suggested RSID from unpack script
  4. Implement changes in batches. For each batch:

    • Grep word/document.xml to verify current text and line numbers (they shift after each script)
    • Write a script using get_node to find nodes, then replace_node, suggest_deletion, or insert_after
    • Run the script and verify with doc.save()
  5. Pack the document:

    python ooxml/scripts/pack.py unpacked reviewed-document.docx
    
  6. Final verification:

    pandoc --track-changes=all reviewed-document.docx -o verification.md
    grep "original phrase" verification.md   # Should NOT match
    grep "replacement phrase" verification.md # Should match
    

Batch plan:

  • Batch 1 (Term changes): "2 years" to "1 year" in Section 5
  • Batch 2 (Jurisdiction): "New York" to "Delaware" in Section 8

Per batch: grep for text, write script, run, verify. After all batches, pack and do final verification.

Method Selection Guide

ScenarioMethod
Change part of regular textreplace_node() with <w:del>/<w:ins>
Delete entire run or paragraphsuggest_deletion()
Reject another author's insertionrevert_insertion() (NOT suggest_deletion())
Restore another author's deletionrevert_deletion()
Partially modify another author's changereplace_node() with nested <w:ins>/<w:del>

Converting Documents to Images

Two-step process for visual analysis:

# Step 1: DOCX to PDF
soffice --headless --convert-to pdf document.docx

# Step 2: PDF pages to JPEG
pdftoppm -jpeg -r 150 document.pdf page
# Creates page-1.jpg, page-2.jpg, etc.

# For specific pages only:
pdftoppm -jpeg -r 150 -f 2 -l 5 document.pdf page

Use -r 150 for a good quality/size balance. Increase to 300 for print-quality output.


Code Style

Write concise code. Avoid verbose variable names, redundant operations, and unnecessary print statements.

Dependencies

Install if not available:

DependencyInstallPurpose
pandocbrew install pandoc or apt-get install pandocText extraction
docxnpm install -g docxCreating new documents
LibreOfficebrew install --cask libreoffice or apt-get install libreofficePDF conversion
Popplerbrew install poppler or apt-get install poppler-utilsPDF to images
defusedxmlpip install defusedxmlSecure XML parsing

References

FilePurpose
references/docx-js.mddocx-js API patterns for creating new documents
references/ooxml.mdOOXML XML patterns, Document library API, tracked changes

Install

Download ZIP
Requires askill CLI v1.0+

AI Quality Score

78/100Analyzed 2/19/2026

Well-structured skill for DOCX manipulation with clear workflows, decision trees, and practical examples. Covers reading, creating, editing, redlining with tracked changes, and conversion. The skill is comprehensive and actionable, though it depends heavily on external reference files and some internal scripts. Good clarity and safety, with moderate reusability due to project-specific tool references.

92
88
68
80
72

Metadata

Licenseunknown
Version-
Updated2/13/2026
Publishercosta-marcello

Tags

apigithub-actions