askill
mistral-document-ai

mistral-document-aiSafety 95Repository

Use Mistral Document AI (mistral-document-ai-2505) deployed on Azure AI Foundry for OCR of PDF and image documents. This skill provides Python code patterns for authenticating and calling the REST API endpoint to extract text and structured data from documents.

0 stars
1.2k downloads
Updated 2/1/2026

Package Files

Loading files...
SKILL.md

Mistral Document AI

Mistral Document AI (model: mistral-document-ai-2505) is deployed on Azure AI Foundry and accessed via REST API for OCR and document understanding.

Authentication & Endpoint

Authentication: API Key (Bearer token) Endpoint format: https://<endpoint_name>.services.ai.azure.com/providers/mistral/azure/ocr Model: mistral-document-ai-2505 Important: Only base64-encoded content is supported, no document URLs.

Environment Setup

import os
import base64
import requests
from typing import Optional, Dict, Any

# Configuration
MISTRAL_ENDPOINT = os.getenv("MISTRAL_DOCUMENT_AI_ENDPOINT")  
MISTRAL_API_KEY = os.getenv("MISTRAL_DOCUMENT_AI_KEY")

Environment variables:

  • MISTRAL_DOCUMENT_AI_ENDPOINT: Full endpoint URL (e.g., https://nextgen-project-1-resource.services.ai.azure.com/providers/mistral/azure/ocr)
  • MISTRAL_DOCUMENT_AI_KEY: API key for authentication

Core OCR Function

def ocr_document(
    file_path: str,
    include_image_base64: bool = False,
    document_annotation_format: Optional[Dict[str, Any]] = None
) -> Dict[str, Any]:
    """
    Extract text from PDF or image using Mistral Document AI.
    
    Args:
        file_path: Path to PDF or image file
        include_image_base64: Include base64 images in response
        document_annotation_format: Optional structured output schema
        
    Returns:
        OCR response with extracted text and metadata
    """
    # Read and encode file
    with open(file_path, 'rb') as f:
        file_content = base64.b64encode(f.read()).decode('utf-8')
    
    # Determine content type
    if file_path.lower().endswith('.pdf'):
        content_type = 'application/pdf'
        doc_type = 'document_url'
        key = 'document_url'
    else:
        # Supports: jpeg, jpg, png, gif, bmp, tiff, webp
        ext = file_path.lower().split('.')[-1]
        content_type = f'image/{ext}' if ext != 'jpg' else 'image/jpeg'
        doc_type = 'image_url'
        key = 'image_url'
    
    # Build request payload
    payload = {
        "model": "mistral-document-ai-2505",
        "document": {
            "type": doc_type,
            key: f"data:{content_type};base64,{file_content}"
        },
        "include_image_base64": include_image_base64
    }
    
    # Add optional structured output
    if document_annotation_format:
        payload["document_annotation_format"] = document_annotation_format
    
    # Make API request
    headers = {
        "Content-Type": "application/json",
        "Authorization": f"Bearer {MISTRAL_API_KEY}"
    }
    
    response = requests.post(
        MISTRAL_ENDPOINT,
        headers=headers,
        json=payload,
        timeout=120
    )
    
    response.raise_for_status()
    return response.json()

Usage Patterns

Basic OCR (Extract All Text)

result = ocr_document("document.pdf")
extracted_text = result['choices'][0]['message']['content']

OCR with Structured Output

Extract specific fields using JSON schema:

annotation_schema = {
    "type": "json_schema",
    "json_schema": {
        "schema": {
            "type": "object",
            "properties": {
                "language": {"type": "string", "description": "Detected language"},
                "title": {"type": "string", "description": "Document title"},
                "summary": {"type": "string", "description": "Brief summary"}
            },
            "required": ["language", "title", "summary"],
            "additionalProperties": False
        },
        "name": "document_annotation",
        "strict": True
    }
}

result = ocr_document("document.pdf", document_annotation_format=annotation_schema)
structured_data = result['choices'][0]['message']['content']

Process Specific PDF Pages

Extract pages first, then OCR each page:

import fitz  # PyMuPDF

def ocr_pdf_pages(pdf_path: str, pages: list[int]) -> dict[int, str]:
    """OCR specific pages from a PDF."""
    results = {}
    doc = fitz.open(pdf_path)
    
    for page_num in pages:
        # Extract single page as new PDF
        single_page_doc = fitz.open()
        single_page_doc.insert_pdf(doc, from_page=page_num, to_page=page_num)
        
        # Save to temporary bytes
        temp_pdf = single_page_doc.write()
        temp_path = f"temp_page_{page_num}.pdf"
        with open(temp_path, 'wb') as f:
            f.write(temp_pdf)
        
        # OCR the single page
        result = ocr_document(temp_path)
        results[page_num] = result['choices'][0]['message']['content']
        
        # Cleanup
        os.remove(temp_path)
        single_page_doc.close()
    
    doc.close()
    return results

Response Structure

{
    "id": "request_id",
    "object": "chat.completion",
    "created": 1234567890,
    "model": "mistral-document-ai-2505",
    "choices": [
        {
            "index": 0,
            "message": {
                "role": "assistant",
                "content": "Extracted text or structured JSON..."
            },
            "finish_reason": "stop"
        }
    ],
    "usage": {
        "prompt_tokens": 123,
        "completion_tokens": 456,
        "total_tokens": 579
    }
}

Error Handling

def ocr_document_safe(file_path: str) -> Optional[str]:
    """OCR with error handling."""
    try:
        result = ocr_document(file_path)
        return result['choices'][0]['message']['content']
    except requests.exceptions.HTTPError as e:
        if e.response.status_code == 401:
            raise ValueError("Invalid API key")
        elif e.response.status_code == 429:
            raise ValueError("Rate limit exceeded")
        else:
            raise ValueError(f"OCR failed: {e.response.text}")
    except Exception as e:
        raise ValueError(f"OCR error: {str(e)}")

Dependencies

# Add to pyproject.toml
dependencies = [
    "requests>=2.31.0",
    "pymupdf>=1.23.0",  # For PDF page extraction
]

Best Practices

  1. File size limits: Test with your specific endpoint's file size limits
  2. Timeout: Adjust timeout based on document size (larger PDFs need more time)
  3. Rate limiting: Implement retry logic with exponential backoff
  4. Environment variables: Never hardcode API keys
  5. Page-by-page processing: For large PDFs, process pages individually to avoid timeouts
  6. Structured output: Use JSON schemas when you need specific fields extracted

Integration with LangChain Agents

from langchain.tools import Tool

def create_ocr_tool():
    """Create a LangChain tool for OCR."""
    return Tool(
        name="ocr_document",
        description="Extract text from PDF or image documents using Mistral Document AI OCR",
        func=lambda file_path: ocr_document_safe(file_path)
    )

Install

Download ZIP
Requires askill CLI v1.0+

AI Quality Score

96/100Analyzed 2/11/2026

A high-quality, comprehensive technical guide for integrating Mistral Document AI on Azure. It provides production-ready Python code, handles edge cases like multi-page PDFs, and includes structured output examples.

95
100
90
95
98

Metadata

Licenseunknown
Version-
Updated2/1/2026
Publishersjuratov

Tags

apici-cdtesting