Gemini Computer Use - Web Browser Automation

You are an expert web application testing assistant using Gemini 2.5 Computer Use - Google's AI that can see and control web browsers.

What This Skill Does

This skill implements Gemini Computer Use the correct way according to Google's official documentation:

Gemini AI analyzes screenshots of your browser
Gemini decides what actions to take (where to click, what to type)
Actions execute on YOUR local browser using Playwright
You WATCH it happen in real-time on your screen
New screenshot sent back to Gemini to continue the loop

✅ AI-powered decision making (Gemini) ✅ Visible browser on your screen (Playwright) ✅ Best of both worlds!

Purpose

Web Application Testing: Automated testing with AI understanding
Browser Automation: Let AI navigate complex workflows
Debugging: Watch AI interact with your site to find issues
Demos: Show intelligent browser automation in action

How It Works

┌─────────────┐
│   Gemini AI │  Analyzes screenshot
│             │  Decides: "Click search box at (821, 202)"
└──────┬──────┘
       │
       ↓ function_call: click(821, 202)
       │
┌──────┴──────┐
│  Playwright │  Executes click on YOUR screen
│   (Visible) │  Captures new screenshot
└──────┬──────┘
       │
       ↓ new screenshot + result
       │
┌──────┴──────┐
│   Gemini AI │  Sees result, plans next action
│             │  Loop continues...
└─────────────┘

Variables

{URL}: Target URL to test/automate
{TASK}: What you want Gemini to do (in natural language)

Usage

Basic Command (Windows)

IMPORTANT: Use absolute path directly - DO NOT use cd commands on Windows!

python "C:\Users\USERNAME\.claude\skills\web-app-testing\scripts\gemini_browser.py" "{URL}" --task "{TASK}"

Example Commands (Windows)

# Search Wikipedia for cats (VISIBLE BROWSER)
python "C:\Users\USERNAME\.claude\skills\web-app-testing\scripts\gemini_browser.py" "https://en.wikipedia.org" --task "Search for cats and tell me the first paragraph about them"

# Test a login flow
python "C:\Users\USERNAME\.claude\skills\web-app-testing\scripts\gemini_browser.py" "http://localhost:3000" --task "Test the login flow with username 'test' and password 'demo123'"

# Check console errors
python "C:\Users\USERNAME\.claude\skills\web-app-testing\scripts\gemini_browser.py" "https://google.com" --task "Navigate to the site and check for any console errors"

# Fill out a form
python "C:\Users\USERNAME\.claude\skills\web-app-testing\scripts\gemini_browser.py" "https://example.com/contact" --task "Fill out the contact form with test data"

# Run with custom slow motion (1 second per action)
python "C:\Users\USERNAME\.claude\skills\web-app-testing\scripts\gemini_browser.py" "https://wikipedia.org" --task "Search for dogs" --slow 1000

# Run in headless mode (no visible browser)
python "C:\Users\USERNAME\.claude\skills\web-app-testing\scripts\gemini_browser.py" "https://google.com" --task "Check console" --headless

Command Options

--task / -t: Required - Natural language description of task
--slow: Slow motion delay in milliseconds (default: 500ms)
--headless: Run without visible browser (default: visible)
--max-turns: Maximum conversation turns (default: 20)

Workflow for Claude Code

When user asks to test a web application or automate browser tasks:

Step 1: Parse Request

Extract:

URL: Target website
Task: What to do (user's natural language description)

Step 2: Run Gemini Computer Use (Windows-Optimized)

CRITICAL: Use absolute path with quoted arguments - NO cd commands!

python "C:\Users\USERNAME\.claude\skills\web-app-testing\scripts\gemini_browser.py" "{URL}" --task "{TASK}"

Token-Efficient Pattern:

✅ Single command execution
✅ Absolute path in quotes
✅ No directory changes needed
✅ Works on Windows without path errors

Step 3: Observe Output

The script will:

✅ Launch visible browser (maximized window)
✅ Show Gemini's decisions in terminal
✅ Execute actions in slow motion (you can watch)
✅ Display console logs when done
✅ Keep browser open 10 seconds for inspection
✅ Return final results

Step 4: Report Results

Summarize what Gemini accomplished, any errors found, and console logs.

Example Session

User: "Go to Wikipedia and search for cats"

Claude Code executes:
  python "C:\Users\USERNAME\.claude\skills\web-app-testing\scripts\gemini_browser.py" "https://en.wikipedia.org" --task "Search for cats"

Output shows:
  [BROWSER] Launching VISIBLE browser...
  [BROWSER] ✓ Browser ready

  TURN 1
  [EXECUTING] navigate({"url": "https://en.wikipedia.org"})
    → Navigating to: https://en.wikipedia.org

  TURN 2
  [GEMINI] I can see the Wikipedia homepage. I'll search for "cats" now.
  [EXECUTING] type_text_at({"x": 821, "y": 202, "text": "cats", "press_enter": true})
    → Clicking at (821, 202) then typing: 'cats'
    → Typing: 'cats'
    → Pressing Enter

  TURN 3
  [GEMINI] I've successfully navigated to the Cat article on Wikipedia.
  [COMPLETE] Task finished!

  BROWSER CONSOLE LOGS
  ✓ No console errors

  [BROWSER] Keeping browser open for 10 seconds...

User sees:

✅ Browser window opens on their screen
✅ Watches Wikipedia load
✅ Sees search box get clicked
✅ Watches "cats" being typed
✅ Sees search submit and results appear
✅ Browser stays open to inspect


## Key Features

### AI Intelligence
- Gemini analyzes page visually (like a human)
- Adapts to different page layouts
- Makes intelligent decisions about what to click
- Understands context and intent

### Visible Execution
- Browser opens on YOUR screen (maximized)
- Actions happen in slow motion (configurable)
- You can watch every step
- Browser stays open for inspection

### Console Log Capture
- Captures errors, warnings, and info messages
- Displays organized summary at end
- Helps identify JavaScript issues

### Screenshot Loop
- Every action triggers new screenshot
- Gemini sees the updated page state
- Enables accurate decision-making

## Important Notes

### This is NOT a Hybrid System
This is the **official Gemini Computer Use implementation** according to Google's documentation. The pattern is:
1. Screenshot → Gemini
2. Gemini → Function call
3. Execute function locally
4. New screenshot → back to Gemini

### Browser Visibility
- **Default**: Visible browser (headless=False)
- **Option**: Can run headless with `--headless` flag
- **Recommended**: Keep visible for debugging/demos

### API Costs
- Each Gemini API call incurs costs
- Screenshots are sent with each turn
- Complex tasks = more API calls
- Monitor usage in Google AI Studio

### Best Practices
- ✅ Use specific, clear task descriptions
- ✅ Test on localhost first before production
- ✅ Watch the browser to understand AI behavior
- ✅ Keep tasks focused and achievable
- ❌ Don't test production without permission
- ❌ Don't use for CAPTCHA bypass or scraping at scale

## Troubleshooting

### Browser doesn't open
- Check Playwright is installed: `pip install playwright`
- Install browsers: `playwright install chromium`

### Gemini not finding elements
- Increase `--slow` to give page time to load
- Check if page uses dynamic content
- Verify URL is accessible

### API errors
- Check API key is valid
- Verify quota not exceeded
- Check internet connectivity

## Version History

- **v3.0.0**: Complete rewrite with proper Gemini Computer Use implementation
- **v2.1.0**: Added local Playwright mode (deprecated)
- **v2.0.0**: Initial Gemini integration (simulated, deprecated)

---

**Created by**: Custom Skill Builder
**Last Updated**: 2025-10-19
**Version**: 3.0.0
**Implementation**: Official Gemini Computer Use pattern

web-app-testingSafety 82Repository

Package Files

Gemini Computer Use - Web Browser Automation

What This Skill Does

Purpose

How It Works

Variables

Usage

Basic Command (Windows)

Example Commands (Windows)

Command Options

Workflow for Claude Code

Step 1: Parse Request

Step 2: Run Gemini Computer Use (Windows-Optimized)

Step 3: Observe Output

Step 4: Report Results

Example Session

Install

AI Quality Score

Metadata

Tags

web-app-testingSafety 82Repository ShareFavorite skill

Package Files

Gemini Computer Use - Web Browser Automation

What This Skill Does

Purpose

How It Works

Variables

Usage

Basic Command (Windows)

Example Commands (Windows)

Command Options

Workflow for Claude Code

Step 1: Parse Request

Step 2: Run Gemini Computer Use (Windows-Optimized)

Step 3: Observe Output

Step 4: Report Results

Example Session

Install

AI Quality Score

Metadata

Tags

web-app-testingSafety 82Repository