askill
ollama-client

ollama-clientSafety 90Repository

Phi-4 LLM interaction skill for generating text completions via Ollama API. Use for all LLM inference tasks including section detection, summarization, recommendation generation, and quality evaluation.

0 stars
1.2k downloads
Updated 2/5/2026

Package Files

Loading files...
SKILL.md

Ollama Client Skill

Overview

This skill provides a Python wrapper for interacting with Ollama's REST API to generate text completions using the Phi-4 model (14B parameters, 16K context window). It handles timeouts, retries, and structured logging for all LLM operations.

When to Use

Use this skill when you need to:

  • Generate text completions from Phi-4
  • Run prompts for clinical analysis tasks
  • Generate JSON-structured outputs from LLM
  • Handle LLM inference with timeout protection

Installation

IMPORTANT: This skill has its own isolated virtual environment (.venv) managed by uv. Do NOT use system Python.

Initialize the skill's environment:

# From the skill directory
cd .agent/skills/ollama-client
uv sync  # Creates .venv and installs dependencies from pyproject.toml

Dependencies are in pyproject.toml:

  • requests - HTTP client for Ollama API

Usage

CRITICAL: Always use uv run to execute code with this skill's .venv, NOT system Python.

Basic Text Generation

# From .agent/skills/ollama-client/ directory
# Run with: uv run python -c "..."
from ollama_client import OllamaClient

# Initialize client
client = OllamaClient(
    host="http://localhost:11434",  # Default from OLLAMA_HOST env var
    model="phi4:14b",                 # Default from OLLAMA_MODEL env var
    timeout=300                       # 5 minutes default
)

# Generate completion
result = client.generate(
    prompt="Summarize the following clinical note: ...",
    temperature=0.1,      # Low temperature for deterministic outputs
    max_tokens=1000,      # Optional token limit
    stop_sequences=["END"]  # Optional stop sequences
)

print(result["response"])
print(f"Execution time: {result['execution_time_ms']}ms")

With Environment Variables

import os

# Set in .env or docker-compose.yml
os.environ['OLLAMA_HOST'] = 'http://localhost:11434'
os.environ['OLLAMA_MODEL'] = 'phi4:14b'

# Client uses env vars automatically
client = OllamaClient()

Using from Another Module

When importing this skill from agents or other code:

import sys
from pathlib import Path

# Add skill to path (use relative path from your location)
skill_path = Path(__file__).parent.parent.parent / ".agent/skills/ollama-client"
sys.path.insert(0, str(skill_path))

from ollama_client import OllamaClient
client = OllamaClient()

Health Check

# Check if Ollama server is accessible
if client.is_available():
    print("Ollama server is healthy")
else:
    print("Ollama server unavailable")

Configuration

Environment Variables:

  • OLLAMA_HOST: Server URL (default: http://localhost:11434)
  • OLLAMA_MODEL: Model name (default: phi4:14b)

Parameters:

  • temperature: Sampling temperature (0.0-1.0, default: 0.1 for deterministic outputs)
  • max_tokens: Maximum tokens to generate (optional)
  • stop_sequences: List of strings to stop generation (optional)
  • timeout: Request timeout in seconds (default: 300)

Error Handling

The skill raises exceptions for:

  • Timeout: If request exceeds timeout duration
  • Connection Error: If Ollama server is unreachable
  • API Error: If Ollama returns an error response

All errors include execution time for debugging.

Best Practices

  1. Low Temperature: Use temperature=0.1 for clinical tasks requiring consistency
  2. Timeouts: Set appropriate timeouts based on prompt complexity (simple: 60s, complex: 300s)
  3. Health Checks: Verify server availability before critical operations
  4. Error Logging: Always log errors with execution time for troubleshooting

Integration with Agents

Agents use this skill for all LLM operations:

  • ToC Subagent: Section topic segmentation
  • Summary Subagent: Clinical entity extraction
  • Recommendation Subagent: Treatment plan generation
  • Evaluator Agent: Quality validation reasoning

Implementation

See ollama_client.py for the full Python implementation.

Install

Download ZIP
Requires askill CLI v1.0+

AI Quality Score

95/100Analyzed 2/13/2026

A comprehensive and well-structured skill for interacting with the Ollama API. It includes detailed installation instructions using `uv`, clear usage examples, configuration options, and best practices. The content is highly actionable and technically robust.

90
95
85
95
95

Metadata

Licenseunknown
Version-
Updated2/5/2026
Publishermajiayu000

Tags

apici-cdllmobservabilityprompting