Thought-Based Reasoning Techniques
Quick Reference
| Technique | When to Use | Accuracy Gain |
|---|---|---|
| Zero-shot CoT | Quick reasoning, no examples | +20-60% |
| Few-shot CoT | Have good examples, need consistency | +30-70% |
| Self-Consistency | High-stakes, need confidence | +10-20% over CoT |
| Tree of Thoughts | Complex problems needing exploration | +50-70% on hard tasks |
| Least-to-Most | Multi-step with clear subproblems | +30-80% |
| ReAct | Tasks needing external information | +15-35% |
| PAL | Math/computation (code execution) | +10-15% |
| Reflexion | Iterative improvement from errors | +10-20% |
Decision Matrix
Need examples? ─No──→ Zero-shot CoT ("Let's think step by step")
│ │
Yes Need higher accuracy?
│ / \
Few-shot CoT Yes No → Done
│ │
Need computation? Self-Consistency (5-10 samples, majority vote)
│ │
Yes Still not enough?
│ / \
PAL Yes No → Done
│
Problem decomposable?
/ \
Yes No
│ │
Least-to-Most Need exploration?
/ \
Yes No
│ │
Tree of Thoughts Need external info?
/ \
Yes No
│ │
ReAct Need iteration?
/ \
Yes No
│ │
Reflexion Use CoT
Core Techniques
1. Zero-shot CoT
Append "Let's think step by step" — no examples needed.
Variants: "Let's break this down." / "Let's approach this systematically." / "First, let me understand the problem..."
2. Few-shot CoT
Provide 2-5 examples with intermediate reasoning steps (not just Q→A).
Q: Roger has 5 tennis balls. He buys 2 cans of 3. How many now?
A: Started with 5. 2 cans × 3 = 6. 5 + 6 = 11. The answer is 11.
Q: [YOUR QUESTION]
A:
3. Self-Consistency
Sample N responses (temp > 0), extract answers, majority vote.
- 5-10 samples minimum. Diminishing returns past ~20.
- Provides confidence measure (agreement level).
4. Tree of Thoughts (ToT)
Tree search over reasoning paths with self-evaluation at each node.
- Generate 3-5 candidate next steps
- Evaluate each: "sure" / "maybe" / "impossible"
- BFS/DFS with beam width, prune "impossible"
- Use for: puzzles, creative tasks, problems where CoT gets <50%
5. Least-to-Most
Two stages: decompose → solve sequentially.
- "To solve X, we need to first solve: [subproblem A], then [subproblem B]"
- Solve A → use answer in B → use answer in C → final answer
6. ReAct (Reasoning + Acting)
Interleave Thought → Action → Observation loops.
Thought 1: I need to find X
Action 1: Search[X]
Observation 1: [result]
Thought 2: Now I need Y based on what I learned
Action 2: Search[Y]
...
Action N: Finish[answer]
Reduces hallucination by grounding in external knowledge.
7. PAL (Program-Aided)
Generate Python code instead of natural language reasoning. Execute for answer.
- Eliminates arithmetic errors
- Requires code interpreter
- Best for: math, symbolic manipulation, data processing
8. Reflexion
After failure: generate verbal reflection → store in memory → retry with insights.
Attempt 1: [failed]
Reflection: Failed because X. Next time I should Y.
Attempt 2: [uses reflection, succeeds]
Achieves 91% on HumanEval (vs GPT-4's 80%).
Task → Technique Matching
| Task Type | Best Techniques |
|---|---|
| Math/Logic | CoT, PAL, Self-Consistency |
| Multi-hop QA | ReAct, Least-to-Most |
| Creative/Puzzles | Tree of Thoughts |
| Code generation | PAL, Reflexion |
| Iterative tasks | Reflexion |
| General reasoning | Zero-shot CoT → Few-shot CoT |
Combining Techniques
- ReAct + Self-Consistency → robust factual answers
- ToT + PAL → complex computational exploration
- Least-to-Most + Reflexion → hard multi-step problems
Common Mistakes
| Mistake | Fix |
|---|---|
| CoT for simple lookups | Reserve for multi-step reasoning |
| Too few Self-Consistency samples | Use 5-10 minimum |
| Generic "think step by step" without validating output | Check reasoning quality, not just presence |
| ToT on linear problems | Use only when exploration/backtracking needed |
| PAL without code execution | Ensure interpreter is available |
| Poor few-shot exemplars | Validate examples actually solve correctly |
Best Practices
- Start simple — Zero-shot CoT first, escalate only if needed
- Use clear step markers — "Step 1:", "First,", "Therefore,"
- Include diverse exemplars covering edge cases
- Add verification — "Let me verify..." at the end
- Match cost to stakes — Self-Consistency/ToT only when accuracy matters enough to justify compute
