osgrep - Semantic Code Search
Semantic search tool for exploring local files using natural language queries instead of regex patterns
What is osgrep?
osgrep replaces traditional grep and find commands with AI-powered natural language queries. It understands code concepts rather than just matching strings, making it ideal for code discovery and conceptual exploration.
Key Features
- Semantic searching: Ask questions in plain language rather than using regex patterns
- Live indexing: Background server automatically keeps search index current
- Structured output: The
--jsonflag returns organized results with file paths, line numbers, relevance scores, and content snippets - Local & Private: Uses
transformers.jsfor 100% local embeddings with no remote API calls - Auto-Isolated Indexes: Each repository automatically gets its own separate index
- Adaptive Performance: Throttles indexing based on system resources to prevent overheating
When to Use This Skill
Use osgrep when you need to:
- Find code based on concepts rather than exact string matches
- Explore unfamiliar codebases quickly
- Locate implementation patterns across a large codebase
- Answer "where do we handle X?" type questions
- Discover similar code patterns or architectural approaches
Example Queries
Natural language queries that work well with osgrep:
# Authentication and security
osgrep --json "How are user authentication tokens validated?"
osgrep --json "Where do we verify permissions?"
# Error handling
osgrep --json "Where do we handle retries or backoff?"
osgrep --json "How are errors logged and reported?"
# Data flow
osgrep --json "Where is user data persisted?"
osgrep --json "How do we cache API responses?"
# Architecture patterns
osgrep --json "dependency injection setup"
osgrep --json "middleware configuration"
Essential Commands
Basic Search
# Default search (returns up to 25 results)
osgrep --json "your question"
# Search within specific path
osgrep --json "your question" path/to/directory
Controlling Results
# Limit total results
osgrep --json -m 10 "your question"
# Get more matches per file (default is 1)
osgrep --json --per-file 3 "your question"
# Combine both limits
osgrep --json -m 20 --per-file 2 "your question"
Server Management
# Start the background server (auto-indexes and watches for changes)
osgrep serve
# Manual indexing
osgrep index
# Check indexed repositories
osgrep list
# Verify installation
osgrep doctor
Output Format
When using --json, osgrep returns structured data:
{
"results": [
{
"file": "src/auth/validator.ts",
"line": 42,
"score": 0.89,
"content": "function validateToken(token: string) { ... }"
}
]
}
Recommended Workflow
-
Start with a natural language query using
--jsonosgrep --json "Where do we handle database migrations?" -
Review the JSON output to determine if it answers your question
- Check relevance scores (higher is better)
- Look at file paths to understand context
- Read snippets to verify relevance
-
Only open full files if you need additional context
- Use the file paths from results
- Increase
--per-fileif you need more context from specific files
-
Refine queries if initial findings lack clarity
- Make queries more specific
- Adjust result limits (
-mand--per-file) - Try different phrasings
Installation & Setup
# Install globally
npm install -g osgrep
# Download embedding models (~150MB, one-time setup)
osgrep setup
# Install Claude Code integration
osgrep install-claude-code
Configuration
Ignoring Files
Create .osgrepignore in your repository root to exclude paths:
# Example .osgrepignore
node_modules/
dist/
*.test.ts
coverage/
osgrep also respects .gitignore automatically.
Environment Variables
MXBAI_STORE: Override store names for manual index isolation
Technical Details
- Chunking: Uses tree-sitter for smart code chunking by function/class boundaries
- Search Algorithm: Reciprocal Rank Fusion combining vector search with keyword matching
- Performance: Adaptive throttling monitors RAM and CPU to maintain system stability
- Index Isolation: Repositories automatically isolated based on Git remote URL or directory name
Tips for Better Results
- Be specific: "JWT token validation logic" works better than "auth stuff"
- Use domain terms: "GraphQL resolver" is better than "API handler"
- Start broad, then narrow: Begin with high-level concepts, then drill down
- Increase per-file limit: When you find the right file but need more context
- Use the server:
osgrep servekeeps indexes fresh and searches fast (<50ms)
Limitations
- Requires initial indexing (automatic on first search)
- Embedding models download is ~150MB (one-time)
- Best results on well-structured code with clear function/class boundaries
- Natural language queries work better than code snippets
License
Apache License 2.0
Source
Based on osgrep by Ryan D'Onofrio
- GitHub: https://github.com/Ryandonofrio3/osgrep
- Built upon concepts from mgrep by MixedBread
