askill
smolvlm

smolvlmSafety 95Repository

Local vision-language model for image analysis using SmolVLM-2B

2 stars
1.2k downloads
Updated 2/15/2026

Package Files

Loading files...
SKILL.md

SmolVLM - Local Image Analysis

Analyze images locally using SmolVLM-2B, a state-of-the-art compact vision-language model optimized for Apple Silicon via mlx-vlm.

Quick Usage

Describe an Image

python ~/.claude/skills/smolvlm/scripts/view_image.py /path/to/image.png

Ask a Question About an Image

python ~/.claude/skills/smolvlm/scripts/view_image.py /path/to/image.png "What text is visible?"

Specific Tasks

# Extract text (OCR)
python ~/.claude/skills/smolvlm/scripts/view_image.py screenshot.png "Extract all text"

# UI analysis
python ~/.claude/skills/smolvlm/scripts/view_image.py ui.png "Describe the UI elements"

# Detailed description
python ~/.claude/skills/smolvlm/scripts/view_image.py photo.jpg --detailed

Effective Prompts

General Description

  • "Describe this image" - Basic description
  • "Describe this image in detail, including colors, composition, and any text" - Comprehensive

Text Extraction (OCR)

  • "Extract all visible text from this image"
  • "What text appears in this screenshot?"
  • "Read the text in this document"

UI/Screenshot Analysis

  • "Describe the user interface elements"
  • "What buttons and controls are visible?"
  • "Identify the application and its current state"

Visual Question Answering

  • "How many [objects] are in this image?"
  • "What color is the [object]?"
  • "Is there a [object] in this image?"

Code/Technical

  • "What programming language is shown?"
  • "Describe what this code does"
  • "Identify any errors in this code screenshot"

Model Details

SpecValue
ModelSmolVLM-2B-Instruct
Size~4GB
Peak Memory5.8GB
Speed~94 tok/s (M-series)
Supported FormatsPNG, JPG, JPEG, GIF, WebP

Requirements

  • macOS with Apple Silicon (M1/M2/M3)
  • Python 3.10+
  • mlx-vlm package: uv pip install mlx-vlm --system

Troubleshooting

"Model not found": First run downloads the model (~4GB). Wait for completion.

Out of memory: Close other applications. Model needs ~6GB free RAM.

Slow first inference: Model loading takes 10-15s on first use, subsequent calls are faster.

Install

Download ZIP
Requires askill CLI v1.0+

AI Quality Score

82/100Analyzed 2/20/2026

Well-structured skill for local image analysis with SmolVLM-2B. Provides clear command examples, comprehensive prompt suggestions, model specs, and troubleshooting. Scores well on actionability and clarity. Slight deduction for missing explicit installation steps and 'when to use' trigger. The skill is appropriately general and reusable for Apple Silicon users despite being in a nested personal skills folder.

95
90
78
75
88

Metadata

Licenseunknown
Version1.0.0
Updated2/15/2026
Publishertdimino

Tags

llm