Detecting AI-Generated Code

Overview

Systematic framework for detecting AI-generated and AI-assisted code. Uses tiered signal detection with confidence scoring.

Core principle: Check metadata FIRST (highest confidence), then comments, then code patterns. Multiple signals compound confidence.

Detection Framework (Check in Order)

Tier 1: Metadata Signals (Highest Confidence)

ALWAYS CHECK THESE FIRST - they're definitive.

Signal	What to Look For	Confidence
Git headers	`Co-Authored-By: Claude`, `Co-Authored-By: GitHub Copilot`	Definitive
Commit patterns	Uniform "Add X functionality" pattern across all commits	High
Generated footers	"Generated with [Claude Code]", "Created by Copilot"	Definitive
Config files	`.cursorrules`, `CLAUDE.md` in repo	High
README template	Exact Claude/GPT README structure (see below)	High

AI README Template Pattern:

# Project Name
## Overview
[Formal description]
## Features
- Feature 1
- Feature 2
## Installation
[Perfect instructions]
## API Endpoints (table)
## Environment Variables (table)
## Contributing
## License

If README follows this EXACT structure with perfect tables and no personality → High AI probability.

AI Commit Message Patterns:

Add user authentication with JWT tokens
Add password recovery functionality
Add comprehensive documentation      ← "comprehensive" is strong AI tell
Implement feature X
Update component Y
Fix bug in Z

Mechanical uniformity + "Add/Implement/Update/Fix" pattern → AI signal.

Human commit patterns (for contrast):

wip: payments
fix typo in readme
initial payment setup
refactor auth - cleanup
oops forgot the tests

Tier 2: Comment Patterns (High Confidence)

Signal	Example	Confidence
Narrating comments	"First, we...", "Here we...", "Now we..."	High
Restating code	`// Initialize array` above `const arr = []`	High
Tutorial style	"This function will...", "Let me explain..."	High
Perfect grammar	No typos, formal English in all comments	Medium
Over-documentation	20-line JSDoc for 5-line function	High

Verbatim AI Phrases:

"This file contains..."
"This function handles..."
"Here we define..."
"Now we check..."
"Finally, we return..."

Tier 3: Code Structure (Medium Confidence)

Signal	Example	Confidence
Over-engineering	Comprehensive error handling for impossible cases	Medium
Template repetition	Same pattern copied across files exactly	Medium
Perfect consistency	Identical formatting across entire codebase	Low-Medium
Defensive coding	Null checks on values that can't be null	Medium

Important: Clean code alone is NOT evidence of AI. Experienced developers also write clean code.

Tier 4: Anti-Signals (What Humans Do, AI Doesn't)

These DECREASE AI probability:

Human Signal	Why
TODO/FIXME comments	AI rarely leaves work incomplete
Debug code left in	`console.log`, commented code
Shortcuts/abbreviations	`amt`, `curr`, `intl` instead of full words
Domain jargon	Industry-specific terms, internal naming
Stack Overflow attribution	"grabbed from SO", "copied from [link]"
Inconsistent formatting	Mixed styles within file
Pragmatic compromises	"prod will have proper checks"

Confidence Scoring

Signals Found	Confidence Level	Recommendation
1 Tier-1 signal	High	Flag for review
2+ Tier-2 signals	High	Likely AI-generated
Tier-2 + Tier-3 combined	Medium-High	Probably AI-assisted
Only Tier-3 signals	Low-Medium	Investigate more
Anti-signals present	Reduces confidence	Human likely involved

Compound scoring: Multiple signals from different tiers = higher confidence than multiple signals from same tier.

AI-Assisted Detection (Partial AI Use)

Look for:

Style jumps - Casual code with formal JSDoc blocks
Git history jumps - "Add comprehensive documentation" as separate commit
Quality inconsistency - Some files over-documented, others bare
Misplaced docs - JSDoc on library calls, not wrapper functions
Parameter mismatches - Docs describe different params than code has

This pattern: Human wrote code → Asked AI to add documentation.

Common Mistakes

Mistake	Reality
"Clean code = AI"	Experienced developers write clean code too
Only checking code	Metadata is highest-confidence signal
Missing git history	"Add comprehensive documentation" commits reveal AI use
Binary thinking	AI-assisted is different from fully AI-generated
Single signal certainty	Compound multiple signals before concluding

Quick Reference Workflow

Check git log - Look for Co-Authored-By, uniform commit patterns
Check README - Template structure? Perfect tables? No personality?
Check file headers - Generated footers? "This file contains..." comments?
Check code comments - Narrating style? Over-documentation?
Check for anti-signals - TODOs? Shortcuts? Debug code?
Score confidence - Compound signals from multiple tiers
Report with evidence - List specific signals found

When NOT to Flag

Clean, well-written code (without other signals)
Standard patterns for solved problems (EventEmitter, validators)
Good documentation (if style matches rest of codebase)
Experienced developer output

Clean code without Tier-1 or Tier-2 signals = insufficient evidence.

detecting-ai-codeSafety 90Repository

Package Files

Detecting AI-Generated Code

Overview

Detection Framework (Check in Order)

Tier 1: Metadata Signals (Highest Confidence)

Tier 2: Comment Patterns (High Confidence)

Tier 3: Code Structure (Medium Confidence)

Tier 4: Anti-Signals (What Humans Do, AI Doesn't)

Confidence Scoring

AI-Assisted Detection (Partial AI Use)

Common Mistakes

Quick Reference Workflow

When NOT to Flag

Install

AI Quality Score

Metadata

Tags

detecting-ai-codeSafety 90Repository ShareFavorite skill

Package Files

Detecting AI-Generated Code

Overview

Detection Framework (Check in Order)

Tier 1: Metadata Signals (Highest Confidence)

Tier 2: Comment Patterns (High Confidence)

Tier 3: Code Structure (Medium Confidence)

Tier 4: Anti-Signals (What Humans Do, AI Doesn't)

Confidence Scoring

AI-Assisted Detection (Partial AI Use)

Common Mistakes

Quick Reference Workflow

When NOT to Flag

Install

AI Quality Score

Metadata

Tags

detecting-ai-codeSafety 90Repository