Automation Suitability Classifier
Overview
Classify workflow steps into automation categories while preventing enthusiasm from overriding prudence. The question isn't "can we automate this?" but "what happens when the automation is wrong?"
Core principle: High volume amplifies errors, not just efficiency. A 1% error rate on 1,000 items/day is 10 problems daily.
Classification Levels
| Level | Definition | When to Use |
|---|---|---|
| Fully Automatable | No human in loop | ALL four criteria score LOW |
| AI-Assisted | Human reviews/approves | ANY criterion scores MEDIUM |
| Human-Only | AI may inform, human decides | ANY criterion scores HIGH |
| Not Worth Touching | Leave as-is | Automation risk > current risk |
The Four Criteria
Score each criterion as LOW / MEDIUM / HIGH:
1. Error Tolerance
What happens when the automation makes a mistake?
| Score | Impact | Examples |
|---|---|---|
| LOW | Easily corrected, no lasting harm | Wrong email subject, minor data typo |
| MEDIUM | Requires effort to fix, some harm | Incorrect routing, delayed processing |
| HIGH | Difficult/impossible to reverse, significant harm | Regulatory filing error, wrong payment, legal exposure |
2. Regulatory Sensitivity
Does this touch regulated data or processes?
| Score | Triggers | Examples |
|---|---|---|
| LOW | No regulatory nexus | Internal scheduling, non-financial data |
| MEDIUM | Regulatory reporting affected downstream | Data feeding into compliance reports |
| HIGH | Direct regulatory touchpoint | AML thresholds, tax IDs, SAR triggers, KYC data, audit trails |
Red flags: Tax ID, SSN, $10K+ amounts, "compliance", "regulatory filing", account type classification
3. Explainability Requirement
Must we explain WHY this decision was made?
| Score | Context | Examples |
|---|---|---|
| LOW | No one asks why | Internal workflow routing |
| MEDIUM | May need to justify to stakeholders | Customer service decisions |
| HIGH | Regulators, auditors, or courts may require explanation | Fraud classification, credit decisions, complaint handling |
4. Downstream Blast Radius
How far do errors propagate?
| Score | Scope | Examples |
|---|---|---|
| LOW | Contained to this step | Isolated data entry |
| MEDIUM | Affects related processes | Incorrect categorization affecting routing |
| HIGH | Triggers external actions or irreversible outcomes | Regulatory filings, payments, legal responses, customer communications |
Classification Flowchart
digraph classify {
rankdir=TB;
node [shape=box];
score [label="Score all 4 criteria"];
any_high [label="Any HIGH?" shape=diamond];
any_medium [label="Any MEDIUM?" shape=diamond];
risk_check [label="Automation risk >\ncurrent risk?" shape=diamond];
full [label="FULLY\nAUTOMATABLE" style=filled fillcolor=lightgreen];
assisted [label="AI-ASSISTED\nONLY" style=filled fillcolor=lightyellow];
human [label="HUMAN-ONLY" style=filled fillcolor=orange];
notworth [label="NOT WORTH\nTOUCHING" style=filled fillcolor=lightgray];
score -> any_high;
any_high -> risk_check [label="yes"];
any_high -> any_medium [label="no"];
any_medium -> assisted [label="yes"];
any_medium -> full [label="no"];
risk_check -> notworth [label="yes"];
risk_check -> human [label="no"];
}
Output Format
step: [Step Name]
classification: [FULLY_AUTOMATABLE | AI_ASSISTED | HUMAN_ONLY | NOT_WORTH_TOUCHING]
criteria_scores:
error_tolerance: [LOW|MEDIUM|HIGH]
error_tolerance_rationale: "[What happens when wrong]"
regulatory_sensitivity: [LOW|MEDIUM|HIGH]
regulatory_sensitivity_rationale: "[Specific regulatory touchpoints]"
explainability_requirement: [LOW|MEDIUM|HIGH]
explainability_rationale: "[Who might ask why, in what context]"
blast_radius: [LOW|MEDIUM|HIGH]
blast_radius_rationale: "[What downstream effects, how far]"
classification_rationale: "[Why this level given the scores]"
# REQUIRED if not FULLY_AUTOMATABLE:
automation_blockers:
- "[Specific blocker 1]"
- "[Specific blocker 2]"
# REQUIRED if NOT_WORTH_TOUCHING:
risk_comparison:
current_risk: "[What can go wrong today]"
automation_risk: "[What could go wrong with automation]"
why_worse: "[Why automation risk exceeds current risk]"
Probing Questions
Before scoring, ask these about EVERY step:
Error Tolerance:
- What happens if this is wrong?
- How would we know it's wrong?
- How hard is it to fix?
Regulatory Sensitivity:
- Does this touch tax IDs, SSNs, or financial identifiers?
- Are there dollar thresholds that trigger reporting (e.g., $10K AML)?
- Does this feed into any regulatory filings?
- Could this affect account classification (retail/institutional, qualified/non-qualified)?
Explainability:
- If a customer complained, could we explain why?
- If a regulator asked, could we show the decision logic?
- Is this legally discoverable?
Blast Radius:
- What happens next with this output?
- Who else uses this data?
- Can the effects be reversed?
Rationalizations to Reject
| Enthusiastic Claim | Why It's Wrong | Counter |
|---|---|---|
| "High volume makes automation obvious" | High volume amplifies errors | Calculate: X% error × volume = Y problems/day |
| "Current human error rate is high" | Automation errors are different, often systematic | Human errors are random; automation errors repeat |
| "It's just data entry" | Data has meaning; wrong data has consequences | What regulatory implications does this data have? |
| "Add a human checkpoint" | Checkpoints fail under volume | Humans rubber-stamp at scale |
| "The stakeholder says it's simple" | Stakeholders underestimate complexity | Probe for hidden business logic |
| "We can always fix errors later" | Some errors can't be fixed | What's irreversible about this? |
| "AI accuracy is better than humans" | Aggregate accuracy hides distribution | Where does AI fail? Are those high-stakes? |
Red Flags in Your Assessment
If you find yourself thinking these, STOP:
- "This is a textbook case for automation"
- "The stakeholder is right"
- "Modern AI/ML can handle this"
- "We just need validation rules"
- "95% accuracy is good enough"
- "Human-only with AI support" (this is just AI-Assisted with extra words)
- "Bordering on not worth touching" (if you're saying this, it probably IS not worth touching)
These thoughts indicate optimism bias. Re-run the criteria scoring with more skepticism.
Special note on "bordering on": If your instinct says a step is borderline between two classifications, choose the more conservative one. "Bordering on NOT_WORTH_TOUCHING" means NOT_WORTH_TOUCHING.
When to Use "Not Worth Touching"
You MUST consider NOT_WORTH_TOUCHING when ANY of these apply:
| Trigger | Why Automation Makes It Worse |
|---|---|
| Undocumented tribal knowledge | Automation will miss the workarounds that make it work |
| Recent regulatory inquiry/finding | Regulators are watching; automation failure = bad optics |
| Process compensates for system bugs | Automation encodes bugs instead of fixing them |
| High staff turnover in this role | Knowledge capture should precede automation |
| Pending regulatory changes | Automating to current rules creates rework |
| Single error caused major incident | Zero tolerance means automation risk is unacceptable |
NOT_WORTH_TOUCHING is not failure. It's honest recognition that:
- Some processes need fixing before automating
- Some automation creates more risk than it solves
- The right answer to "should we automate?" is sometimes "not yet" or "never"
If you score ANY criterion as HIGH and the current process works despite its pain points, strongly consider NOT_WORTH_TOUCHING.
The goal is not maximum automation. The goal is appropriate automation.
Financial Services Context
In regulated financial services, the consequences of automation errors include:
- Regulatory fines and sanctions
- Mandatory disclosure and remediation
- Personal liability for executives
- License risk
A single misclassified SAR trigger, unreported CTR, or misrouted fraud complaint can create regulatory exposure that dwarfs any efficiency gain.
Default to conservative classification. It's easier to upgrade a Human-Only to AI-Assisted after gaining confidence than to explain to regulators why your fully automated process failed.
