askill
conversion-mapping-rules

conversion-mapping-rulesSafety 100Repository

This skill documents how html-to-markdown maps 60+ HTML element types to their Markdown equivalents. The conversion logic respects Markdown syntax variations (ATX vs Setext headings, fenced vs indented code, etc.) and maintains semantic accuracy.

484 stars
9.7k downloads
Updated 1/28/2026

Package Files

Loading files...
SKILL.md

Conversion Mapping Rules: HTML Elements to Markdown

Overview

This skill documents how html-to-markdown maps 60+ HTML element types to their Markdown equivalents. The conversion logic respects Markdown syntax variations (ATX vs Setext headings, fenced vs indented code, etc.) and maintains semantic accuracy.

Heading Elements (h1-h6)

ATX Style (Default)

# Heading 1
## Heading 2
### Heading 3
#### Heading 4
##### Heading 5
###### Heading 6

Implementation:

  • Option: HeadingStyle::Atx (default)
  • Each heading level uses n hashes
  • Single space after hashes required
  • Trailing hashes optional (ATX closed style adds them)

HTML Example:

<h1>Title</h1>          → # Title
<h2 id="intro">Intro</h2> → ## Intro
<h3>Detail</h3>         → ### Detail

Setext/Underlined Style

Heading 1
=========

Heading 2
---------

Implementation:

  • Option: HeadingStyle::Underlined
  • H1: = characters for full line width
  • H2: - characters for full line width
  • H3+ not supported in Setext (fallback to ATX)

HTML Example:

<h1>Main Title</h1>  → Main Title\n===========
<h2>Subtitle</h2>    → Subtitle\n---------
<h3>Detail</h3>      → ### Detail (fallback to ATX)

ATX Closed Style

# Heading 1 #
## Heading 2 ##
### Heading 3 ###

Implementation:

  • Option: HeadingStyle::AtxClosed
  • Closing hashes must match opening count
  • Single space before closing hashes
  • Less common, but valid Markdown

Block-Level Elements

Paragraph (<p>)

Mapping:

  • Text content extracted and escaped
  • Trailing/leading whitespace trimmed
  • Single newline after paragraph

Example:

<p>This is a paragraph with <strong>bold</strong> text.</p>
→ This is a paragraph with **bold** text.\n

Division (<div>)

Behavior:

  • Transparent wrapper for Markdown
  • Content treated as block-level
  • No wrapping markers in output
  • Preserves child semantics

Example:

<div>
  <p>Paragraph inside div</p>
</div>
→ Paragraph inside div\n

Blockquote (<blockquote>)

Mapping:

  • Each line prefixed with >
  • Nested blockquotes: > >
  • Handles multiple paragraphs

Example:

<blockquote>
  <p>Quote line 1</p>
  <p>Quote line 2</p>
</blockquote>
→ > Quote line 1\n>\n> Quote line 2\n

Preformatted Text (<pre>)

Behavior:

  • Whitespace preserved exactly
  • Treated as code block (see Code Blocks below)
  • No entity decoding in content
  • Trimmed and indented

Example:

<pre>    code with spaces</pre>
→ (indented code or fenced, depends on CodeBlockStyle)

Code Blocks

Indented Style (Default):

    line 1
    line 2
    line 3

Implementation:

  • Option: CodeBlockStyle::Indented
  • Each line prefixed with 4 spaces
  • Requires blank line before/after
  • CommonMark default

Fenced Backtick Style:

```language
code here

**Implementation:**
- Option: `CodeBlockStyle::Backticks`
- Triple backticks with optional language specifier
- Language from HTML class (e.g., `language-rust` → `rust`)
- Can contain blank lines

**Fenced Tilde Style:**
```markdown
~~~rust
code here
~~~

Implementation:

  • Option: CodeBlockStyle::Tildes
  • Triple tildes with optional language specifier
  • Less common variant of fenced style

HTML Mapping:

<pre><code>simple code</code></pre>
<pre><code class="language-python">def foo(): pass</code></pre>
<pre>indented code</pre>

Horizontal Rule (<hr>)

Output: ---\n (three dashes)

Alternatives: ***, ___ all valid but standardized to ---

List Elements

Unordered Lists (<ul>)

Default Syntax (dashes):

- Item 1
- Item 2
  - Nested item
    - Deeply nested

Implementation:

  • - marker (could be * or +, but - is default)
  • Indentation for nesting: spaces or tabs
  • Option: ListIndentType::Spaces (default) or ListIndentType::Tabs

Ordered Lists (<ol>)

1. First item
2. Second item
3. Third item

Implementation:

  • 1. through 9. for first 9 items (reset per list)
  • Number must be followed by . (dot space)
  • Indentation matches unordered for nesting

List Items (<li>)

Behavior:

  • Content can include block elements (paragraphs, code blocks)
  • Continuation lines indented to match marker
  • Multi-line items:
- First paragraph

  Second paragraph (indented)

HTML Example:

<ul>
  <li>
    <p>Item with paragraph</p>
    <p>Second paragraph</p>
  </li>
</ul>

Definition Lists (<dl>, <dt>, <dd>)

Term
:   Definition

Another Term
:   Definition 1
:   Definition 2

Implementation:

  • <dt>: Term on its own line
  • <dd>: Definition with : prefix and indentation
  • Multiple definitions per term supported

Tables (<table>, <tr>, <td>, <th>)

Mapping:

| Header 1 | Header 2 |
|----------|----------|
| Cell 1   | Cell 2   |
| Cell 3   | Cell 4   |

Implementation:

  • <table> → GFM (GitHub Flavored Markdown) table
  • <thead> content becomes header row
  • <tbody> rows become data rows
  • Cells separated by | pipes
  • Separator row: |---|---| (minimum 3 dashes)
  • Right-alignment: :---| Left: |:-- Center: :--:

Cell Content:

  • Escaped for pipe characters (|\|)
  • Nested elements converted (e.g., <strong>**)
  • Newlines converted to <br> representation

Semantic HTML5 Elements

Article (<article>)

  • Treated as transparent block wrapper
  • No semantic markers in Markdown
  • Content flows as-is

Section (<section>)

  • Transparent block wrapper
  • Could insert heading separator in future

Nav (<nav>)

  • List-like wrapper
  • Children converted normally
  • Could insert navigation markers

Aside (<aside>)

  • Optional blockquote prefix (configurable)
  • Or treated as transparent block

Header (<header>)

  • Transparent wrapper
  • Content converted normally

Footer (<footer>)

  • Transparent wrapper
  • Could insert footer marker (e.g., ---\n)

Main (<main>)

  • Transparent wrapper
  • Content flows normally

Inline Elements

Emphasis (<em>, <i>)

Mapping: *text* or _text_

Implementation:

  • Default: * (asterisk italic)
  • No underscore escaping needed in this context
  • Trimmed of excess whitespace

Example:

<em>emphasized</em>  → *emphasized*
<i>italic</i>        → *italic*

Strong (<strong>, <b>)

Mapping: **text**

Implementation:

  • Double asterisks (bold)
  • Trimmed of excess whitespace
  • Can be nested with emphasis

Example:

<strong>bold</strong>           → **bold**
<b>bold</b>                     → **bold**
<strong><em>bold italic</em></strong> → ***bold italic***

Code (<code>)

Mapping: `text` (backtick inline code)

Implementation:

  • Single backticks for inline
  • Escaped if backticks present in content
  • No entity decoding within code

Example:

<code>variable_name</code>        → `variable_name`
<code>don't</code>                → `don't`
<code>`already_quoted`</code>     → `` `already_quoted` ``

Link (<a href>)

Mapping: [link text](url "title")

Implementation:

  • href attribute becomes URL
  • Text content becomes link text
  • title attribute becomes optional title (in quotes)
  • URL preserved as-is (no extra encoding)
  • Special link types:
    • href="#section" → Anchor link
    • href="/page" → Internal link (relative)
    • href="https://external.com" → External link
    • href="mailto:user@example.com" → Email link
    • href="tel:+1234567890" → Phone link

Examples:

<a href="https://example.com">Link</a>
→ [Link](https://example.com)

<a href="/page" title="My Page">Internal</a>
→ [Internal](/page "My Page")

<a href="#section">Anchor</a>
→ [Anchor](#section)

<a href="mailto:test@example.com">Email</a>
→ [Email](mailto:test@example.com)

Image (<img>)

Mapping: ![alt text](url "title")

Implementation:

  • src attribute becomes URL
  • alt attribute becomes alt text
  • title attribute becomes optional title
  • Dimensions (width, height) captured in metadata
  • Data URIs: ![alt](data:image/png;base64,...)
  • Relative paths preserved

Examples:

<img src="photo.jpg" alt="A photo">
→ ![A photo](photo.jpg)

<img src="image.png" alt="Image" title="My Image" width="200" height="150">
→ ![Image](image.png "My Image")

<img src="data:image/png;base64,..." alt="Embedded">
→ ![Embedded](data:image/png;base64,...)

Line Break (<br>)

Mapping:

  • Two spaces + newline: \n
  • Or backslash + newline: \\\n

Option: NewlineStyle::Spaces (default) or NewlineStyle::Backslash

Example:

<p>Line 1<br>Line 2</p>
→ Line 1  \nLine 2\n

Strikethrough (<s>, <del>, <strike>)

Mapping: ~~strikethrough~~

Implementation:

  • GFM strikethrough syntax (double tilde)
  • Not standard Markdown, but widely supported
  • Trimmed of excess whitespace

Example:

<del>removed text</del>  → ~~removed text~~
<s>strikethrough</s>     → ~~strikethrough~~

Subscript/Superscript (<sub>, <sup>)

Behavior:

  • No native Markdown support
  • Typically converted to plain text or HTML passthrough
  • Implementation: Extract text content, no markup

Example:

H<sub>2</sub>O          → H2O (plain text)
E=mc<sup>2</sup>        → E=mc2 (plain text)

Mark/Highlight (<mark>)

Options:

  1. HighlightStyle::DoubleEqual: ==text==
  2. HighlightStyle::Html: <mark>text</mark>
  3. HighlightStyle::Bold: **text**
  4. HighlightStyle::None: plain text

Example:

<mark>highlighted</mark>
→ ==highlighted==  (DoubleEqual mode)
→ <mark>highlighted</mark>  (Html mode)
→ **highlighted**  (Bold mode)

Ruby Annotations (<ruby>, <rt>, <rp>)

Mapping:

  • Japanese ruby text support
  • Format: text {rt_text} or similar
  • Implementation: Extract base text with rt annotation

Example:

<ruby>漢字<rt>かんじ</rt></ruby>
→ 漢字 (かんじ)

Media Elements

Audio (<audio>)

Behavior:

  • No direct Markdown equivalent
  • Typically extracted as metadata or skipped
  • Could insert link to source if src attribute

Handling:

<audio src="sound.mp3">Audio</audio>
→ (Skipped or converted to link in metadata)

Video (<video>)

Behavior:

  • Similar to audio
  • Could extract poster image
  • Typically skipped in markdown output

Picture/Source (<picture>, <source>)

Behavior:

  • Responsive image container
  • Extract from child <img> inside
  • Or use first source src

Form Elements

Input (<input>)

Behavior:

  • Generally skipped or marked as form element
  • Could convert to metadata about form structure
  • Types: text, checkbox, radio, button, hidden

Implementation:

  • Placeholder preserved in metadata
  • Value not typically included in markdown

Select/Option (<select>, <option>)

Behavior:

  • Converted to list or metadata
  • Option text extracted
  • Selected state noted

Button (<button>)

Behavior:

  • Text content extracted (ignores <button> wrapper)
  • Click handlers ignored
  • Treated as inline text

Textarea (<textarea>)

Behavior:

  • Content treated as code block or preformatted
  • Whitespace preserved

Special Elements

SVG (<svg>)

Behavior:

  • Can be preserved as inline image or skipped
  • Feature: inline-images can extract inline SVG
  • Typically rendered as-is in compatible markdown renderers

MathML (<math>)

Behavior:

  • Skipped in standard markdown
  • Could be preserved with feature gate
  • Converted to LaTeX or plain text fallback

iframe (<iframe>)

Behavior:

  • Generally skipped
  • Could extract as metadata (video embeds, etc.)
  • URL captured if needed

Whitespace and Formatting Context

Whitespace Mode

Normalized (default):

  • Multiple spaces collapsed to single space
  • Multiple newlines → single newline
  • Leading/trailing whitespace trimmed per element

Strict:

  • All whitespace preserved exactly
  • Multiple spaces and newlines intact
  • Useful for poetry, ASCII art, etc.

Text Escaping

Options:

  • escape_asterisks: *\*
  • escape_underscores: _\_
  • escape_misc: Special chars \ & < [ > ~ # = + | -`
  • escape_ascii: All ASCII punctuation (CommonMark spec)

Example:

<p>Price: $10 & free shipping *limited time*</p>

escape_misc=true:
→ Price: $10 \& free shipping *limited time*

escape_asterisks=true:
→ Price: $10 & free shipping \*limited time\*

escape_ascii=true:
→ Price: \$10 \& free shipping \*limited time\*

Implementation Details Location

Key Files:

  • /crates/html-to-markdown/src/converter.rs - Element dispatch and conversion
  • /crates/html-to-markdown/src/options.rs - Style configuration enums
  • /crates/html-to-markdown/src/text.rs - Text escaping and normalization

Element Dispatch Example

// From converter.rs pattern
match element.tag_name() {
    "h1" | "h2" | "h3" | "h4" | "h5" | "h6" => convert_heading(...),
    "p" => convert_paragraph(...),
    "a" => convert_link(...),
    "img" => convert_image(...),
    "strong" | "b" => convert_strong(...),
    "em" | "i" => convert_em(...),
    "code" => convert_code(...),
    "pre" => convert_pre(...),
    "blockquote" => convert_blockquote(...),
    "ul" | "ol" => convert_list(...),
    "li" => convert_list_item(...),
    "table" => convert_table(...),
    "br" => convert_br(...),
    "hr" => convert_hr(...),
    // ... 40+ more elements
    _ => convert_generic_element(...)
}

Complete Element Reference

See /crates/html-to-markdown/src/visitor.rs for exhaustive NodeType enum covering all 60+ supported elements.

Quick Reference Table

HTML ElementMarkdown OutputNotes
<h1># textATX style default
<p>text\nParagraph
<strong>**text**Bold
<em>*text*Italic
<a href>[text](url)Link
<img>![alt](src)Image
<ul>- itemUnordered list
<ol>1. itemOrdered list
<code>`text`Inline code
<pre>Indented or fencedCode block
<blockquote>> textQuote
<table>GFM tablePipe-delimited
<br> \nLine break
<hr>---Horizontal rule
<del>~~text~~Strikethrough
<mark>==text==Highlight (configurable)

Install

Download ZIP
Requires askill CLI v1.0+

AI Quality Score

95/100Analyzed 2/4/2026

A high-density technical reference documenting HTML-to-Markdown mapping rules with exhaustive element coverage and implementation pointers.

100
95
80
100
90

Metadata

Licenseunknown
Version-
Updated1/28/2026
Publisherkreuzberg-dev

Tags

githubtesting