askill
chatgpt-search

chatgpt-searchSafety --Repository

Search ChatGPT conversation exports using SQLite FTS5 (SQLite full-text search). BM25-ranked full-text search (relevance scoring) with TF-IDF keywords (term-weighted key phrases), date/role/model/language filtering, and conversation browsing. Use when agent needs to search past ChatGPT conversations by topic, find specific discussions, browse conversation history, or find conversations by extracted keywords. Do NOT use for non-ChatGPT knowledge bases — use a dedicated document search tool. Do NOT use for Apple Notes or Obsidian — use a dedicated document search tool.

7 stars
1.2k downloads
Updated 2/19/2026

Package Files

Loading files...
SKILL.md

chatgpt-search

SQLite FTS5 (SQLite full-text search) engine for ChatGPT conversation exports. BM25-ranked full-text search (relevance scoring) with title boosting, code separation, TF-IDF (term-frequency/inverse-document-frequency) keyword extraction, and filtering by date, role, model, and language.

Setup

cd /path/to/skills/chatgpt-search
./scripts/setup.sh /path/to/your/conversations.json
export PYTHONPATH=/path/to/skills/chatgpt-search/src
  • Claude Code: copy this skill folder into .claude/skills/chatgpt-search/
  • Codex CLI: append this SKILL.md content to your project's root AGENTS.md

For the full installation walkthrough (prerequisites, verification, troubleshooting), see references/installation-guide.md.

Staying Updated

This skill ships with an UPDATES.md changelog and UPDATE-GUIDE.md for your AI agent.

After installing, tell your agent: "Check UPDATES.md in the chatgpt-search skill for any new features or changes."

When updating, tell your agent: "Read UPDATE-GUIDE.md and apply the latest changes from UPDATES.md."

Follow UPDATE-GUIDE.md so customized local files are diffed before any overwrite.

Repo: ./ Data: <your-export-path>/conversations.json Default DB: ~/.chatgpt-search/index.db

Quick Start

cd . && ./scripts/setup.sh <your-export-path>/conversations.json
export PYTHONPATH=./src
python -m chatgpt_search.cli "your topic query" --limit 10

Decision Tree

Need to search past ChatGPT conversations?
  |
  +-- Know a topic/keyword? --> Full-text search: "query"
  |     +-- Want only user messages? --> add --role user
  |     +-- Want a specific model's responses? --> add --model gpt-5
  |     +-- Want a date range? --> add --since 2025-01 --until 2025-06
  |     +-- Want a specific language? --> add --lang ru
  |
  +-- Know a conversation ID? --> --conversation <id> (or partial ID)
  |
  +-- Want to explore keywords?
  |     +-- Top corpus keywords --> --keywords
  |     +-- Keywords for a conversation --> --keywords --keywords-conversation <id>
  |
  +-- Want corpus overview? --> --stats
  |
  +-- Need to search non-ChatGPT docs? --> Use your project's document search skill
  +-- Need to search Apple Notes/Obsidian? --> Use a dedicated document search tool
  +-- Need web search? --> Use web-search skill (optional companion, not required)

Setup

cd . && ./scripts/setup.sh <your-export-path>/conversations.json

This installs dependencies (scikit-learn, langdetect) and builds the index from the provided conversations.json location. Rebuild takes ~26 seconds on the full corpus (1,514 conversations, 16,689 messages).

CLI Reference

# Set PYTHONPATH (or install the package)
export PYTHONPATH=./src

# --- Search ---

# Full-text search
python -m chatgpt_search.cli "transformer attention"

# Date filtering
python -m chatgpt_search.cli "kubernetes" --since 2025-01
python -m chatgpt_search.cli "pytorch" --since 2025-06 --until 2025-12

# Role filtering (search only user messages or assistant responses)
python -m chatgpt_search.cli "pricing strategy" --role user

# Model filtering (partial match)
python -m chatgpt_search.cli "code review" --model gpt-5
python -m chatgpt_search.cli "reasoning" --model o3

# Language filtering
python -m chatgpt_search.cli "machine learning" --lang en
python -m chatgpt_search.cli "обучение" --lang ru

# Phrase queries (exact match)
python -m chatgpt_search.cli '"attention is all you need"'

# Prefix queries
python -m chatgpt_search.cli "transfor*"

# Limit results
python -m chatgpt_search.cli "topic" --limit 5
python -m chatgpt_search.cli "topic" -n 50

# --- Browse ---

# Browse a full conversation
python -m chatgpt_search.cli --conversation <conversation-id>
python -m chatgpt_search.cli -c <partial-id>

# --- Keyword Exploration ---

# Top keywords across the corpus (by total TF-IDF score)
python -m chatgpt_search.cli --keywords

# Keywords for a specific conversation
python -m chatgpt_search.cli --keywords --keywords-conversation <conversation-id>

# --- Corpus Info ---

# Corpus statistics (conversations, messages, keywords, models, dates)
python -m chatgpt_search.cli --stats

# --- Index Management ---

# Rebuild index (includes TF-IDF enrichment)
python -m chatgpt_search.cli --rebuild --export /path/to/conversations.json

# Custom database location
python -m chatgpt_search.cli --db /path/to/index.db "query"

Search Syntax

FTS5 query syntax (SQLite full-text query operators) is supported:

SyntaxExampleMeaning
Simple termstransformer attentionImplicit AND
Phrase"attention is all"Exact phrase match
Prefixtransfor*Words starting with "transfor"
ORpytorch OR tensorflowEither term
NOTpython NOT javaExclude term

Architecture

  • Engine: SQLite FTS5 (SQLite full-text search) with BM25 ranking (relevance scoring)
  • Indexing: Message-level rows, conversation metadata joined at query time
  • Boosting: Title at 10x weight, content at 1x, code at 0.5x
  • Tokenizer: Porter stemmer + Unicode61 (handles diacritics)
  • TF-IDF: scikit-learn TfidfVectorizer (term-weighting), unigrams + bigrams, code blocks stripped, top-10 keywords per conversation, min_df=2 for larger language groups and min_df=1 for small groups, max_df=0.8
  • Language Detection: langdetect per message, 15 languages supported
  • Parser: Canonical thread extraction via current_node backward traversal
  • Code separation: Fenced code blocks extracted to separate field
  • PUA cleanup: Unicode Private Use Area (PUA) citation markers stripped
  • Citeturn cleanup: ChatGPT citation markup (citeturn0search1, etc.) stripped

Performance

Tested on 149MB export (1,514 conversations, 16,689 messages):

MetricValue
Full index build (with TF-IDF)~26 seconds
TF-IDF extraction alone~3 seconds
Database size~89 MB
Keywords extracted15,085
Search latency<50ms

Anti-Patterns

Do NOTDo instead
Use for non-ChatGPT document searchUse your project's document search skill
Use for Apple Notes or ObsidianUse a dedicated document search tool
Expect semantic searchThis is lexical BM25 -- use exact terms, expand synonyms manually
Search single common words ("the", "is")Use qualifying terms to narrow results
Forget to rebuild after new exportRun --rebuild after importing new conversations.json
Expect TF-IDF keywords on fresh/tiny corporaSmall groups use min_df=1, but tiny exports can still yield sparse keywords

Error Handling

SymptomCauseFix
"Database not found"Index not builtRun --rebuild --export /path/to/conversations.json
No keyword resultsCorpus too small or low textual signalNormal for small exports; rebuild with more data
"Invalid search query"FTS5 syntax errorCheck query syntax; avoid unmatched quotes
scikit-learn warning during buildscikit-learn not installedRun python3 -m pip install scikit-learn

Bundled Resources Index

PathWhatWhen to load
./UPDATES.mdStructured changelog for AI agentsWhen checking for new features or updates
./UPDATE-GUIDE.mdInstructions for AI agents performing updatesWhen updating this skill
./references/installation-guide.mdDetailed install walkthrough for Claude Code and Codex CLIFirst-time setup or environment repair
./README.mdLocal package and development notesWhen debugging setup or extending the CLI
./scripts/setup.shOne-command dependency setup and index bootstrapDuring first-time setup or rebuild reset
./src/chatgpt_search/Search/index implementation modulesWhen patching ranking, parsing, or filters
./tests/Coverage for parser/index/search behaviorBefore refactors and when validating fixes

Install

Download ZIP
Requires askill CLI v1.0+

AI Quality Score

AI review pending.

Metadata

Licenseunknown
Version-
Updated2/19/2026
Publisherbuildoak

Tags

ci-cddatabasellmobservability