askill
clinicaltrials-database-search

clinicaltrials-database-searchSafety 95Repository

Query ClinicalTrials.gov API v2 for clinical study data. Search trials by condition, drug/intervention, location, sponsor, or phase. Retrieve detailed study information by NCT ID. Filter by recruitment status, paginate large result sets, export to CSV. For clinical research, patient matching, drug development tracking, and trial portfolio analysis.

5 stars
1.2k downloads
Updated 2/20/2026

Package Files

Loading files...
SKILL.md

ClinicalTrials.gov Database — Clinical Trial Search

Overview

Query the ClinicalTrials.gov API v2 (public, no authentication) to search and retrieve clinical trial data worldwide. Supports searching by condition, intervention, location, sponsor, and status; retrieving detailed study information by NCT ID; paginating large result sets; and exporting to CSV.

When to Use

  • Searching for recruiting clinical trials for a specific condition or disease
  • Finding trials testing a specific drug, device, or intervention
  • Locating trials in a specific geographic region for patient referral
  • Tracking a sponsor's or institution's clinical trial portfolio
  • Retrieving detailed eligibility criteria, outcomes, and contacts for a specific trial
  • Analyzing clinical trial trends (phases, enrollment, timelines) across a therapeutic area
  • Exporting trial data for systematic reviews or meta-analyses
  • Monitoring trial status changes and results postings
  • For chemical compound bioactivity data use chembl-database-bioactivity instead; for published literature use pubmed-database

Prerequisites

uv pip install requests pandas

API details:

  • Base URL: https://clinicaltrials.gov/api/v2
  • Authentication: None required (public API)
  • Rate limit: ~50 requests/minute per IP
  • Response formats: JSON (default), CSV
  • Max page size: 1000 studies per request
  • Date format: ISO 8601; text fields use CommonMark Markdown

Quick Start

import requests
import time

CT_API = "https://clinicaltrials.gov/api/v2"

def ct_search(params):
    """Reusable helper for ClinicalTrials.gov searches."""
    response = requests.get(f"{CT_API}/studies", params=params, timeout=30)
    response.raise_for_status()
    return response.json()

# Search for recruiting breast cancer trials
results = ct_search({
    "query.cond": "breast cancer",
    "filter.overallStatus": "RECRUITING",
    "pageSize": 10,
    "sort": "LastUpdatePostDate:desc"
})
print(f"Found {results['totalCount']} trials")
for study in results['studies'][:3]:
    nct = study['protocolSection']['identificationModule']['nctId']
    title = study['protocolSection']['identificationModule']['briefTitle']
    print(f"  {nct}: {title}")

Key Concepts

Response Data Structure

ClinicalTrials.gov returns deeply nested JSON. Key navigation paths:

DataPath
NCT IDstudy['protocolSection']['identificationModule']['nctId']
Titlestudy['protocolSection']['identificationModule']['briefTitle']
Statusstudy['protocolSection']['statusModule']['overallStatus']
Phasestudy['protocolSection']['designModule']['phases']
Enrollmentstudy['protocolSection']['designModule']['enrollmentInfo']['count']
Eligibilitystudy['protocolSection']['eligibilityModule']
Locationsstudy['protocolSection']['contactsLocationsModule']['locations']
Interventionsstudy['protocolSection']['armsInterventionsModule']['interventions']
Resultsstudy.get('resultsSection') (None if no results posted)

Study Status Values

StatusDescription
RECRUITINGCurrently recruiting participants
NOT_YET_RECRUITINGApproved but not yet open
ENROLLING_BY_INVITATIONInvitation-only enrollment
ACTIVE_NOT_RECRUITINGActive, enrollment closed
SUSPENDEDTemporarily halted
TERMINATEDStopped prematurely
COMPLETEDStudy concluded
WITHDRAWNWithdrawn before enrollment

Study Phase Values

PhaseDescription
EARLY_PHASE1Early Phase 1 (formerly Phase 0)
PHASE1Phase 1 — safety and dosing
PHASE2Phase 2 — efficacy and side effects
PHASE3Phase 3 — large-scale efficacy
PHASE4Phase 4 — post-market surveillance
NANot applicable (non-drug studies)

Query Parameters Reference

ParameterTypeDescriptionExample
query.condstringCondition/diseaselung cancer
query.intrstringIntervention/drugPembrolizumab
query.locnstringGeographic locationNew York
query.sponsstringSponsor nameNational Cancer Institute
query.termstringGeneral full-text searchimmunotherapy
filter.overallStatusstringStatus filter (comma-separated)RECRUITING,COMPLETED
filter.phasestringPhase filterPHASE2,PHASE3
filter.idsstringNCT ID filterNCT04852770
sortstringSort orderLastUpdatePostDate:desc
pageSizeintResults per page (max 1000)100
pageTokenstringPagination token(from previous response)
formatstringResponse formatjson or csv

Sort options: LastUpdatePostDate, EnrollmentCount, StartDate, StudyFirstPostDate — each with :asc or :desc.

Core API

1. Search by Condition

results = ct_search({
    "query.cond": "type 2 diabetes",
    "filter.overallStatus": "RECRUITING",
    "pageSize": 20,
    "sort": "LastUpdatePostDate:desc"
})
print(f"Found {results['totalCount']} recruiting diabetes trials")
for study in results['studies'][:5]:
    proto = study['protocolSection']
    nct = proto['identificationModule']['nctId']
    title = proto['identificationModule']['briefTitle']
    print(f"  {nct}: {title}")

2. Search by Intervention/Drug

# Find Phase 3 trials testing Pembrolizumab
results = ct_search({
    "query.intr": "Pembrolizumab",
    "filter.overallStatus": "RECRUITING,ACTIVE_NOT_RECRUITING",
    "filter.phase": "PHASE3",
    "pageSize": 50
})
print(f"Phase 3 Pembrolizumab trials: {results['totalCount']}")

3. Search by Location

results = ct_search({
    "query.cond": "cancer",
    "query.locn": "New York",
    "filter.overallStatus": "RECRUITING",
    "pageSize": 20
})

# Extract location details
for study in results['studies'][:3]:
    locs = study['protocolSection'].get('contactsLocationsModule', {}).get('locations', [])
    for loc in locs:
        if 'New York' in loc.get('city', ''):
            print(f"  {loc.get('facility')}: {loc['city']}, {loc.get('state', '')}")

4. Search by Sponsor

results = ct_search({
    "query.spons": "National Cancer Institute",
    "pageSize": 20
})

for study in results['studies'][:5]:
    sponsor_mod = study['protocolSection']['sponsorCollaboratorsModule']
    lead = sponsor_mod['leadSponsor']['name']
    collabs = [c['name'] for c in sponsor_mod.get('collaborators', [])]
    print(f"  Lead: {lead}, Collaborators: {collabs}")

5. Retrieve Study Details by NCT ID

nct_id = "NCT04852770"
response = requests.get(f"{CT_API}/studies/{nct_id}", timeout=30)
response.raise_for_status()
study = response.json()

# Extract key information
proto = study['protocolSection']
print(f"Title: {proto['identificationModule']['briefTitle']}")
print(f"Status: {proto['statusModule']['overallStatus']}")

# Eligibility criteria
elig = proto.get('eligibilityModule', {})
print(f"Ages: {elig.get('minimumAge')} - {elig.get('maximumAge')}")
print(f"Sex: {elig.get('sex')}")
print(f"Criteria:\n{elig.get('eligibilityCriteria', 'N/A')[:300]}")

6. Pagination for Large Result Sets

all_studies = []
page_token = None
max_pages = 10

for page in range(max_pages):
    params = {
        "query.cond": "cancer",
        "filter.overallStatus": "RECRUITING",
        "pageSize": 1000,
    }
    if page_token:
        params["pageToken"] = page_token

    results = ct_search(params)
    all_studies.extend(results['studies'])
    page_token = results.get('nextPageToken')

    if not page_token:
        break
    time.sleep(1.5)  # respect rate limits

print(f"Retrieved {len(all_studies)} studies across {page + 1} pages")

7. Export to CSV

response = requests.get(f"{CT_API}/studies", params={
    "query.cond": "heart disease",
    "filter.overallStatus": "RECRUITING",
    "format": "csv",
    "pageSize": 1000
}, timeout=60)

with open("heart_disease_trials.csv", "w") as f:
    f.write(response.text)
print("Exported to heart_disease_trials.csv")

Common Workflows

Workflow 1: Multi-Criteria Trial Discovery

import requests, time

CT_API = "https://clinicaltrials.gov/api/v2"

def ct_search(params):
    response = requests.get(f"{CT_API}/studies", params=params, timeout=30)
    response.raise_for_status()
    return response.json()

# Step 1: Search with multiple filters
results = ct_search({
    "query.cond": "lung cancer",
    "query.intr": "immunotherapy",
    "query.locn": "California",
    "filter.overallStatus": "RECRUITING,NOT_YET_RECRUITING",
    "pageSize": 100,
    "sort": "LastUpdatePostDate:desc"
})
print(f"Total matches: {results['totalCount']}")

# Step 2: Filter by phase
phase23 = [
    s for s in results['studies']
    if any(p in ['PHASE2', 'PHASE3']
           for p in s['protocolSection'].get('designModule', {}).get('phases', []))
]
print(f"Phase 2/3 trials: {len(phase23)}")

# Step 3: Extract summaries
for study in phase23[:5]:
    proto = study['protocolSection']
    nct = proto['identificationModule']['nctId']
    title = proto['identificationModule']['briefTitle']
    enrollment = proto.get('designModule', {}).get('enrollmentInfo', {}).get('count', 'N/A')
    print(f"  {nct}: {title} (n={enrollment})")

Workflow 2: Completed Trials with Results Analysis

# Step 1: Find completed trials with posted results
results = ct_search({
    "query.cond": "alzheimer disease",
    "filter.overallStatus": "COMPLETED",
    "pageSize": 100,
    "sort": "LastUpdatePostDate:desc"
})

with_results = [s for s in results['studies'] if s.get('hasResults', False)]
print(f"Completed with results: {len(with_results)} / {len(results['studies'])}")

# Step 2: Get detailed results for top trial
if with_results:
    nct = with_results[0]['protocolSection']['identificationModule']['nctId']
    detail = requests.get(f"{CT_API}/studies/{nct}", timeout=30).json()

    if 'resultsSection' in detail:
        outcomes = detail['resultsSection'].get('outcomeMeasuresModule', {})
        measures = outcomes.get('outcomeMeasures', [])
        for m in measures[:3]:
            print(f"  Outcome: {m.get('title')}")
            print(f"  Type: {m.get('type')}")

Workflow 3: Sponsor Portfolio Comparison

sponsors = ["Pfizer", "Novartis", "Roche"]
for sponsor in sponsors:
    results = ct_search({
        "query.spons": sponsor,
        "filter.overallStatus": "RECRUITING",
        "pageSize": 1
    })
    print(f"{sponsor}: {results['totalCount']} recruiting trials")
    time.sleep(1.5)

Common Recipes

Recipe: Rate-Limited Bulk Search

def ct_search_with_retry(params, max_retries=3):
    for attempt in range(max_retries):
        try:
            response = requests.get(f"{CT_API}/studies", params=params, timeout=30)
            response.raise_for_status()
            return response.json()
        except requests.exceptions.HTTPError as e:
            if e.response.status_code == 429:
                wait = 60
                print(f"Rate limited. Waiting {wait}s...")
                time.sleep(wait)
            else:
                raise
        except requests.exceptions.RequestException:
            if attempt == max_retries - 1:
                raise
            time.sleep(2 ** attempt)
    raise Exception("Max retries exceeded")

Recipe: Extract Study Summary

def extract_summary(study):
    proto = study.get('protocolSection', {})
    ident = proto.get('identificationModule', {})
    status = proto.get('statusModule', {})
    design = proto.get('designModule', {})
    return {
        'nct_id': ident.get('nctId'),
        'title': ident.get('officialTitle') or ident.get('briefTitle'),
        'status': status.get('overallStatus'),
        'phases': design.get('phases', []),
        'enrollment': design.get('enrollmentInfo', {}).get('count'),
        'last_update': status.get('lastUpdatePostDateStruct', {}).get('date')
    }

# Usage
for study in results['studies'][:3]:
    s = extract_summary(study)
    print(f"{s['nct_id']}: {s['status']} | Phase: {s['phases']} | n={s['enrollment']}")

Recipe: Safe Field Navigation

def safe_get(study, *keys, default='N/A'):
    """Navigate nested study JSON safely."""
    current = study
    for key in keys:
        if isinstance(current, dict):
            current = current.get(key)
        else:
            return default
        if current is None:
            return default
    return current

# Usage — handles missing fields gracefully
nct = safe_get(study, 'protocolSection', 'identificationModule', 'nctId')
phases = safe_get(study, 'protocolSection', 'designModule', 'phases', default=[])
enrollment = safe_get(study, 'protocolSection', 'designModule', 'enrollmentInfo', 'count')

Key Parameters

ParameterEndpointDefaultDescription
query.condsearchCondition/disease search term
query.intrsearchIntervention/drug search term
query.locnsearchGeographic location filter
query.sponssearchSponsor/organization filter
query.termsearchGeneral full-text search
filter.overallStatussearchallComma-separated status values
filter.phasesearchallComma-separated phase values
pageSizesearch10Results per page (max 1000)
sortsearchrelevance{field}:{asc|desc}
formatbothjsonjson or csv
timeout(client)30sSet in requests call

Troubleshooting

ProblemCauseSolution
429 Too Many RequestsRate limit exceeded (~50/min)Wait 60s; use max pageSize=1000; implement exponential backoff
Empty studies arrayNo trials match filtersBroaden search (remove status/phase filters); check spelling
400 Bad RequestInvalid parameter valueVerify status/phase values match enumeration exactly (e.g., RECRUITING not recruiting)
Missing resultsSectionTrial has no posted resultsCheck study['hasResults'] before accessing results
KeyError on nested fieldNot all trials have all modulesUse .get() with defaults or safe_get helper (see Recipes)
Pagination stops earlynextPageToken absentAll results retrieved; check totalCount vs collected count
CSV format differs from JSONDifferent field structureCSV flattens nested structure; use JSON for programmatic access
Timeout on large exportsCSV with many resultsIncrease timeout; paginate with pageSize=1000 instead

Best Practices

  • Use maximum page size (1000) for bulk retrieval to minimize request count against rate limit
  • Always check hasResults before accessing resultsSection — most trials have no posted results
  • Navigate safely with .get() chains — not all trials populate all modules (especially contactsLocationsModule, armsInterventionsModule)
  • Specify multiple status values with commas (e.g., RECRUITING,NOT_YET_RECRUITING) — don't make separate requests per status
  • Use sort=LastUpdatePostDate:desc by default — returns most recently updated trials first
  • Date interpretation: lastUpdatePostDateStruct.date is ISO 8601 string; type field indicates ACTUAL vs ESTIMATED

Related Skills

  • pubmed-database — Published literature search complementary to trial registry data
  • chembl-database-bioactivity — Compound bioactivity data for drugs under investigation
  • bioservices-multi-database — Alternative database access via unified Python interface

References

Bundled Resources

Self-contained entry. Original total: 866 lines (SKILL.md 507 + api_reference.md 359). Scripts: 216 lines (query_clinicaltrials.py).

Original file disposition:

  • SKILL.md (507 lines) → Core API modules 1-7 (condition, intervention, location, sponsor, details, pagination, CSV export). "Core Capabilities" sections 1-10 consolidated: Search by Condition → Module 1, Search by Intervention → Module 2, Geographic Search → Module 3, Search by Sponsor → Module 4, Retrieve Detailed Study → Module 5, Pagination → Module 6, Data Export → Module 7, Combined Query → Workflow 1, Extract Summary → Recipe. Promotional section stripped (rule 4). "Resources" section stub → removed, content consolidated inline. Per-use-case disposition: Patient Matching → When to Use bullet + Workflow 1; Research Analysis → When to Use + Workflow 2; Drug Tracking → When to Use + Module 2; Geographic Search → Module 3; Sponsor Tracking → Module 4 + Workflow 3; Data Export → Module 7; Trial Monitoring → When to Use bullet; Eligibility Screening → Module 5
  • references/api_reference.md (359 lines) → Fully consolidated inline: endpoint parameters → Key Concepts "Query Parameters Reference" table; status/phase values → Key Concepts tables; response structure → Key Concepts "Response Data Structure" table; HTTP error codes → Troubleshooting table; rate limit guidance → Prerequisites + Best Practices; use cases → duplicated main SKILL.md examples, absorbed into Core API; data standards (ISO 8601, CommonMark) → Prerequisites note. Error handling patterns → Recipes "Rate-Limited Bulk Search"
  • scripts/query_clinicaltrials.py (216 lines) → Helper function pattern: search_studies() → Quick Start ct_search() helper; get_study_details() → Module 5 inline; search_with_all_results() → Module 6 pagination pattern; extract_study_summary() → Recipe "Extract Study Summary". Thin-wrapper shortcut applied — each function was a thin wrapper around requests.get()

Retention: ~465 lines / 866 original (excl. scripts) = ~54%.

Install

Download ZIP
Requires askill CLI v1.0+

AI Quality Score

87/100Analyzed 2/23/2026

High-quality technical reference skill for querying ClinicalTrials.gov API v2. Comprehensive coverage of search, retrieval, pagination, and CSV export with well-structured Python code examples. Excellent clarity with tables and clear section organization. Includes practical workflows for multi-criteria discovery, results analysis, and sponsor portfolio comparison. The skill is domain-tailored in placement but content is broadly reusable. Minor improvement opportunity: expand error handling details. Strong bonus from having 'When to Use' section, structured steps, useful tags, dedicated skills folder, and high-density accurate reference content.

95
90
90
85
92

Metadata

Licenseunknown
Version-
Updated2/20/2026
Publisherjaechang-hits

Tags

apidatabasegithub-actionsobservability