Image OCR
Extract text from images using Tesseract OCR via Python.
When to use this skill
- User asks to read or extract text from an image
- User has a screenshot with text they want to process
- User has scanned documents that need text extraction
- User wants to digitize text from photos
Scripts overview
| Script | Purpose | Dependencies |
|---|---|---|
ocr_extract.py | Extract text from images with multiple options | pytesseract, Pillow |
Steps
1. Install dependencies (first time only)
Install the Python packages:
pip install pytesseract Pillow
Install Tesseract OCR engine:
- Windows: Download installer from https://github.com/UB-Mannheim/tesseract/wiki
- macOS:
brew install tesseract - Linux (Ubuntu/Debian):
sudo apt install tesseract-ocr - Linux (Fedora):
sudo dnf install tesseract
For additional language support:
- Windows: Select languages during installation
- Linux:
sudo apt install tesseract-ocr-chi-sim(Chinese Simplified),tesseract-ocr-jpn(Japanese), etc.
CRITICAL — Dependency Error Recovery: If the script fails with an
ImportErroror "tesseract not found" error, install the missing dependencies using the commands above, then re-run the EXACT SAME script command that failed.
2. Extract text from an image
python scripts/ocr_extract.py "IMAGE_PATH"
Options:
--lang LANG— OCR language (default:eng). Usechi_simfor Chinese,jpnfor Japanese,eng+chi_simfor multiple.--save OUTPUT_PATH— Save extracted text to a file--preprocess MODE— Image preprocessing:none(default),grayscale,threshold,blur--dpi DPI— Set image DPI for better accuracy (default: auto-detect)--psm MODE— Tesseract page segmentation mode (0-13, default: 3 = auto)
Examples:
# Basic text extraction
python scripts/ocr_extract.py "screenshot.png"
# Chinese text extraction
python scripts/ocr_extract.py "document.jpg" --lang chi_sim
# Mixed English and Chinese
python scripts/ocr_extract.py "mixed.png" --lang eng+chi_sim
# Preprocess noisy image for better accuracy
python scripts/ocr_extract.py "noisy_scan.png" --preprocess threshold
# Save output to file
python scripts/ocr_extract.py "scan.tiff" --save output.txt
# Single line of text (e.g., license plate, serial number)
python scripts/ocr_extract.py "plate.jpg" --psm 7
Page Segmentation Modes (PSM)
| Mode | Description | Use Case |
|---|---|---|
| 3 | Fully automatic (default) | General documents |
| 4 | Assume single column | Single-column text |
| 6 | Assume single block | Uniform text block |
| 7 | Single line | One line of text |
| 8 | Single word | One word |
| 11 | Sparse text | Text scattered on image |
| 13 | Raw line | Single line, no OSD |
Edge cases
- Low quality images: Use
--preprocess thresholdor--preprocess blurto improve results - Rotated text: Tesseract handles slight rotation; for heavily rotated images, rotate first
- Very small text: Increase DPI with
--dpi 300or higher - Mixed languages: Combine with
+, e.g.,--lang eng+chi_sim+jpn - Empty results: Try different PSM modes or preprocessing options
Scripts
- ocr_extract.py — Extract text from images using Tesseract OCR
