Technique: Root Cause Tracing
Concept: Bugs are rarely where they explode. The explosion (Stack Trace) is just where the bad data finally violated a constraint.
The Protocol
Do not fix the line in the stack trace yet. Follow the data upstream.
Step 1: Identify the "Contraband"
What specific value caused the crash?
- Example: A
nulluser ID. - Example: An empty string
""where a date was expected. - Example: A value of
-1in a price field.
Step 2: The Upstream Walk (The "5 Whys")
Open your IDE. Use "Find Usages" or "Call Hierarchy."
- Level 0 (Crash Site):
processPayment(user.id)-> Crashed becauseuser.idis null.- Question: Who called
processPayment?
- Question: Who called
- Level 1:
CheckoutService.submitOrdercalled it.- Question: Where did
CheckoutServiceget theuserobject?
- Question: Where did
- Level 2: It was passed in from
SessionManager.getCurrentUser().- Question: Why did
SessionManagerreturn a user object with a null ID?
- Question: Why did
- Level 3:
SessionManagerhydrated it from the Redis Cache.- Question: Why is the data in Redis corrupt?
- Level 4 (Root Cause): The
LoginHandlerwrote the user to Redis before the database generated the ID.
Step 3: Verify the Origin
- Hypothesis: The bug is in
LoginHandler(Level 4), notprocessPayment(Level 0). - Test: Fix the
LoginHandlerordering. - Result: The crash at Level 0 disappears because the data is now correct.
Common Traps
- The "Null Check" Trap: Adding
if (user.id == null) return;at Level 0 fixes the crash, but it creates a "Zombie Order" (an order that fails silently). Do not do this. - The "Sanitizer" Trap: Cleaning the string at Level 1 hides the fact that the database at Level 5 is corrupt.
Rule: Fix the data generation, not the data consumption.
