Gemini Image Skill

Invoke Google Gemini models for image generation, image understanding, and visual analysis using the Python google-genai SDK.

Available Models

Model ID	Description	Best For	Output Format
`gemini-3-pro-image-preview`	Best image generation + understanding	High-quality image gen, complex visual analysis	JPEG
`gemini-2.5-flash-image`	Fast image generation	Quick image creation	PNG
`gemini-3-pro-preview`	Multimodal understanding	Image analysis without generation	N/A
`gemini-2.5-flash`	Fast vision	Quick image analysis	N/A

Configuration

API Key: ${GEMINI_API_KEY}

Usage

Image Generation

python -c "
from google import genai
from google.genai import types

client = genai.Client(api_key=os.environ.get("GEMINI_API_KEY"))

response = client.models.generate_content(
    model='gemini-3-pro-image-preview',  # Returns JPEG | Use gemini-2.5-flash-image for PNG
    contents='Generate an image of a sunset over mountains',
    config=types.GenerateContentConfig(
        response_modalities=['IMAGE', 'TEXT']
    )
)

# Map mime types to file extensions
mime_to_ext = {'image/png': '.png', 'image/jpeg': '.jpg', 'image/gif': '.gif', 'image/webp': '.webp'}

# Save generated image
if response.candidates and response.candidates[0].content:
    for part in response.candidates[0].content.parts:
        if hasattr(part, 'inline_data') and part.inline_data:
            ext = mime_to_ext.get(part.inline_data.mime_type, '.png')
            filename = f'output{ext}'
            # Data is already raw bytes - no base64 decode needed
            with open(filename, 'wb') as f:
                f.write(part.inline_data.data)
            print(f'Image saved to {filename} ({part.inline_data.mime_type})')
        elif hasattr(part, 'text'):
            print(part.text)
"

Image Understanding (Analyze Image from File)

python -c "
from google import genai
from google.genai import types
import base64

client = genai.Client(api_key=os.environ.get("GEMINI_API_KEY"))

# Read image file - must be base64 encoded for INPUT
with open('IMAGE_PATH', 'rb') as f:
    image_data = base64.b64encode(f.read()).decode('utf-8')

response = client.models.generate_content(
    model='gemini-3-pro-preview',
    contents=[
        types.Content(parts=[
            types.Part(text='Describe this image in detail'),
            types.Part(inline_data=types.Blob(mime_type='image/png', data=image_data))
        ])
    ]
)
print(response.text)
"

Image Understanding (From URL)

python -c "
from google import genai
from google.genai import types
import urllib.request
import base64

client = genai.Client(api_key=os.environ.get("GEMINI_API_KEY"))

# Fetch image from URL - must be base64 encoded for INPUT
url = 'IMAGE_URL_HERE'
with urllib.request.urlopen(url) as response:
    image_data = base64.b64encode(response.read()).decode('utf-8')

response = client.models.generate_content(
    model='gemini-3-pro-preview',
    contents=[
        types.Content(parts=[
            types.Part(text='What is in this image?'),
            types.Part(inline_data=types.Blob(mime_type='image/jpeg', data=image_data))
        ])
    ]
)
print(response.text)
"

Workflow

When this skill is invoked:

Determine the task type:
- Image Generation: User wants to create an image
- Image Understanding: User wants to analyze an existing image
- Image Editing: User wants to modify an image (generation with reference)
Select the appropriate model:
- Image generation → gemini-3-pro-image-preview (JPEG) or gemini-2.5-flash-image (PNG)
- Image analysis → gemini-3-pro-preview or gemini-2.5-flash
Prepare the input:
- For generation: Text prompt describing desired image
- For understanding: Load image file as base64
Execute and handle output:
- Generation: Save binary image data to file
- Understanding: Return text description

Example Invocations

Generate Product Image

python -c "
from google import genai
from google.genai import types

client = genai.Client(api_key=os.environ.get("GEMINI_API_KEY"))

response = client.models.generate_content(
    model='gemini-3-pro-image-preview',
    contents='Create a professional product photo of a sleek wireless headphone on a white background, studio lighting',
    config=types.GenerateContentConfig(
        response_modalities=['IMAGE', 'TEXT']
    )
)

mime_to_ext = {'image/png': '.png', 'image/jpeg': '.jpg', 'image/gif': '.gif', 'image/webp': '.webp'}

if response.candidates and response.candidates[0].content:
    for part in response.candidates[0].content.parts:
        if hasattr(part, 'inline_data') and part.inline_data:
            ext = mime_to_ext.get(part.inline_data.mime_type, '.png')
            with open(f'headphone{ext}', 'wb') as f:
                f.write(part.inline_data.data)
            print(f'Image saved to headphone{ext}')
"

Analyze Screenshot

python -c "
from google import genai
from google.genai import types
import base64

client = genai.Client(api_key=os.environ.get("GEMINI_API_KEY"))

with open('screenshot.png', 'rb') as f:
    image_data = base64.b64encode(f.read()).decode('utf-8')

response = client.models.generate_content(
    model='gemini-3-pro-preview',
    contents=[
        types.Content(parts=[
            types.Part(text='Analyze this UI screenshot. Identify any usability issues and suggest improvements.'),
            types.Part(inline_data=types.Blob(mime_type='image/png', data=image_data))
        ])
    ]
)
print(response.text)
"

OCR / Extract Text from Image

python -c "
from google import genai
from google.genai import types
import base64

client = genai.Client(api_key=os.environ.get("GEMINI_API_KEY"))

with open('document.png', 'rb') as f:
    image_data = base64.b64encode(f.read()).decode('utf-8')

response = client.models.generate_content(
    model='gemini-3-pro-preview',
    contents=[
        types.Content(parts=[
            types.Part(text='Extract all text from this image. Preserve formatting where possible.'),
            types.Part(inline_data=types.Blob(mime_type='image/png', data=image_data))
        ])
    ]
)
print(response.text)
"

Compare Two Images

python -c "
from google import genai
from google.genai import types
import base64

client = genai.Client(api_key=os.environ.get("GEMINI_API_KEY"))

with open('image1.png', 'rb') as f:
    img1_data = base64.b64encode(f.read()).decode('utf-8')
with open('image2.png', 'rb') as f:
    img2_data = base64.b64encode(f.read()).decode('utf-8')

response = client.models.generate_content(
    model='gemini-3-pro-preview',
    contents=[
        types.Content(parts=[
            types.Part(text='Compare these two images. What are the key differences?'),
            types.Part(inline_data=types.Blob(mime_type='image/png', data=img1_data)),
            types.Part(inline_data=types.Blob(mime_type='image/png', data=img2_data))
        ])
    ]
)
print(response.text)
"

Image Generation Parameters

When generating images, you can customize:

config=types.GenerateContentConfig(
    response_modalities=['IMAGE', 'TEXT'],  # Request both image and description
    temperature=1.0,  # Higher = more creative
    # Additional parameters may be model-specific
)

Supported Image Formats

Input (for understanding):

PNG (image/png)
JPEG (image/jpeg)
GIF (image/gif)
WebP (image/webp)

Output (from generation):

PNG (default, image/png)
The API returns raw bytes in part.inline_data.data (NOT base64 encoded)
Check part.inline_data.mime_type to determine the actual format returned

Error Handling

Common errors and solutions:

Image too large: Resize image before sending (max varies by model)
Unsupported format: Convert to PNG/JPEG
Generation blocked: Adjust prompt to comply with safety guidelines
Rate limiting: Implement retry with exponential backoff

Notes

Image generation requires response_modalities=['IMAGE', 'TEXT'] in config
For best results with generation, be specific and descriptive in prompts
Image understanding works with both local files and URLs
Multiple images can be sent in a single request for comparison
Gemini 3 Pro Image is NOT available via CLI - must use Python SDK

Tools to Use

Bash: Execute Python commands
Read: Load image files (binary mode)
Write: Save generated images
Glob: Find image files in directories

gemini-imageSafety 90Repository

Package Files

Gemini Image Skill

Available Models

Configuration

Usage

Image Generation

Image Understanding (Analyze Image from File)

Image Understanding (From URL)

Workflow

Example Invocations

Generate Product Image

Analyze Screenshot

OCR / Extract Text from Image

Compare Two Images

Image Generation Parameters

Supported Image Formats

Error Handling

Notes

Tools to Use

Install

AI Quality Score

Metadata

Tags

gemini-imageSafety 90Repository ShareFavorite skill

Package Files

Gemini Image Skill

Available Models

Configuration

Usage

Image Generation

Image Understanding (Analyze Image from File)

Image Understanding (From URL)

Workflow

Example Invocations

Generate Product Image

Analyze Screenshot

OCR / Extract Text from Image

Compare Two Images

Image Generation Parameters

Supported Image Formats

Error Handling

Notes

Tools to Use

Install

AI Quality Score

Metadata

Tags

gemini-imageSafety 90Repository