askill
image-ocr

image-ocrSafety 95Repository

Extract text from images using Python OCR. Use when the user wants to read text from screenshots, photos of documents, scanned pages, or any image containing text. Supports PNG, JPEG, TIFF, BMP, and WebP formats.

311 stars
6.2k downloads
Updated 3/29/2026

Package Files

Loading files...
SKILL.md

Image OCR

Extract text from images using Tesseract OCR via Python.

When to use this skill

  • User asks to read or extract text from an image
  • User has a screenshot with text they want to process
  • User has scanned documents that need text extraction
  • User wants to digitize text from photos

Scripts overview

ScriptPurposeDependencies
ocr_extract.pyExtract text from images with multiple optionspytesseract, Pillow

Steps

1. Install dependencies (first time only)

Install the Python packages:

pip install pytesseract Pillow

Install Tesseract OCR engine:

For additional language support:

  • Windows: Select languages during installation
  • Linux: sudo apt install tesseract-ocr-chi-sim (Chinese Simplified), tesseract-ocr-jpn (Japanese), etc.

CRITICAL — Dependency Error Recovery: If the script fails with an ImportError or "tesseract not found" error, install the missing dependencies using the commands above, then re-run the EXACT SAME script command that failed.

2. Extract text from an image

python scripts/ocr_extract.py "IMAGE_PATH"

Options:

  • --lang LANG — OCR language (default: eng). Use chi_sim for Chinese, jpn for Japanese, eng+chi_sim for multiple.
  • --save OUTPUT_PATH — Save extracted text to a file
  • --preprocess MODE — Image preprocessing: none (default), grayscale, threshold, blur
  • --dpi DPI — Set image DPI for better accuracy (default: auto-detect)
  • --psm MODE — Tesseract page segmentation mode (0-13, default: 3 = auto)

Examples:

# Basic text extraction
python scripts/ocr_extract.py "screenshot.png"

# Chinese text extraction
python scripts/ocr_extract.py "document.jpg" --lang chi_sim

# Mixed English and Chinese
python scripts/ocr_extract.py "mixed.png" --lang eng+chi_sim

# Preprocess noisy image for better accuracy
python scripts/ocr_extract.py "noisy_scan.png" --preprocess threshold

# Save output to file
python scripts/ocr_extract.py "scan.tiff" --save output.txt

# Single line of text (e.g., license plate, serial number)
python scripts/ocr_extract.py "plate.jpg" --psm 7

Page Segmentation Modes (PSM)

ModeDescriptionUse Case
3Fully automatic (default)General documents
4Assume single columnSingle-column text
6Assume single blockUniform text block
7Single lineOne line of text
8Single wordOne word
11Sparse textText scattered on image
13Raw lineSingle line, no OSD

Edge cases

  • Low quality images: Use --preprocess threshold or --preprocess blur to improve results
  • Rotated text: Tesseract handles slight rotation; for heavily rotated images, rotate first
  • Very small text: Increase DPI with --dpi 300 or higher
  • Mixed languages: Combine with +, e.g., --lang eng+chi_sim+jpn
  • Empty results: Try different PSM modes or preprocessing options

Scripts

Install

Download ZIP
Requires askill CLI v1.0+

AI Quality Score

84/100Analyzed 3/9/2026

High-quality, actionable OCR skill with comprehensive coverage of Tesseract usage. Provides clear installation instructions for multiple platforms, detailed command options, PSM mode references, and edge case handling. Well-structured with tables and examples. Bonus points for 'when to use' section, structured steps, tags, skills folder location, and technical depth. No internal-only indicators present.

95
85
80
75
85

Metadata

Licenseunknown
Version-
Updated3/29/2026
PublisherAIDotNet

Tags

github