Automation Suitability Classifier

Overview

Classify workflow steps into automation categories while preventing enthusiasm from overriding prudence. The question isn't "can we automate this?" but "what happens when the automation is wrong?"

Core principle: High volume amplifies errors, not just efficiency. A 1% error rate on 1,000 items/day is 10 problems daily.

Classification Levels

Level	Definition	When to Use
Fully Automatable	No human in loop	ALL four criteria score LOW
AI-Assisted	Human reviews/approves	ANY criterion scores MEDIUM
Human-Only	AI may inform, human decides	ANY criterion scores HIGH
Not Worth Touching	Leave as-is	Automation risk > current risk

The Four Criteria

Score each criterion as LOW / MEDIUM / HIGH:

1. Error Tolerance

What happens when the automation makes a mistake?

Score	Impact	Examples
LOW	Easily corrected, no lasting harm	Wrong email subject, minor data typo
MEDIUM	Requires effort to fix, some harm	Incorrect routing, delayed processing
HIGH	Difficult/impossible to reverse, significant harm	Regulatory filing error, wrong payment, legal exposure

2. Regulatory Sensitivity

Does this touch regulated data or processes?

Score	Triggers	Examples
LOW	No regulatory nexus	Internal scheduling, non-financial data
MEDIUM	Regulatory reporting affected downstream	Data feeding into compliance reports
HIGH	Direct regulatory touchpoint	AML thresholds, tax IDs, SAR triggers, KYC data, audit trails

Red flags: Tax ID, SSN, $10K+ amounts, "compliance", "regulatory filing", account type classification

3. Explainability Requirement

Must we explain WHY this decision was made?

Score	Context	Examples
LOW	No one asks why	Internal workflow routing
MEDIUM	May need to justify to stakeholders	Customer service decisions
HIGH	Regulators, auditors, or courts may require explanation	Fraud classification, credit decisions, complaint handling

4. Downstream Blast Radius

How far do errors propagate?

Score	Scope	Examples
LOW	Contained to this step	Isolated data entry
MEDIUM	Affects related processes	Incorrect categorization affecting routing
HIGH	Triggers external actions or irreversible outcomes	Regulatory filings, payments, legal responses, customer communications

Classification Flowchart

digraph classify {
    rankdir=TB;
    node [shape=box];

    score [label="Score all 4 criteria"];
    any_high [label="Any HIGH?" shape=diamond];
    any_medium [label="Any MEDIUM?" shape=diamond];
    risk_check [label="Automation risk >\ncurrent risk?" shape=diamond];

    full [label="FULLY\nAUTOMATABLE" style=filled fillcolor=lightgreen];
    assisted [label="AI-ASSISTED\nONLY" style=filled fillcolor=lightyellow];
    human [label="HUMAN-ONLY" style=filled fillcolor=orange];
    notworth [label="NOT WORTH\nTOUCHING" style=filled fillcolor=lightgray];

    score -> any_high;
    any_high -> risk_check [label="yes"];
    any_high -> any_medium [label="no"];
    any_medium -> assisted [label="yes"];
    any_medium -> full [label="no"];
    risk_check -> notworth [label="yes"];
    risk_check -> human [label="no"];
}

Output Format

step: [Step Name]
classification: [FULLY_AUTOMATABLE | AI_ASSISTED | HUMAN_ONLY | NOT_WORTH_TOUCHING]

criteria_scores:
  error_tolerance: [LOW|MEDIUM|HIGH]
  error_tolerance_rationale: "[What happens when wrong]"

  regulatory_sensitivity: [LOW|MEDIUM|HIGH]
  regulatory_sensitivity_rationale: "[Specific regulatory touchpoints]"

  explainability_requirement: [LOW|MEDIUM|HIGH]
  explainability_rationale: "[Who might ask why, in what context]"

  blast_radius: [LOW|MEDIUM|HIGH]
  blast_radius_rationale: "[What downstream effects, how far]"

classification_rationale: "[Why this level given the scores]"

# REQUIRED if not FULLY_AUTOMATABLE:
automation_blockers:
  - "[Specific blocker 1]"
  - "[Specific blocker 2]"

# REQUIRED if NOT_WORTH_TOUCHING:
risk_comparison:
  current_risk: "[What can go wrong today]"
  automation_risk: "[What could go wrong with automation]"
  why_worse: "[Why automation risk exceeds current risk]"

Probing Questions

Before scoring, ask these about EVERY step:

Error Tolerance:

What happens if this is wrong?
How would we know it's wrong?
How hard is it to fix?

Regulatory Sensitivity:

Does this touch tax IDs, SSNs, or financial identifiers?
Are there dollar thresholds that trigger reporting (e.g., $10K AML)?
Does this feed into any regulatory filings?
Could this affect account classification (retail/institutional, qualified/non-qualified)?

Explainability:

If a customer complained, could we explain why?
If a regulator asked, could we show the decision logic?
Is this legally discoverable?

Blast Radius:

What happens next with this output?
Who else uses this data?
Can the effects be reversed?

Rationalizations to Reject

Enthusiastic Claim	Why It's Wrong	Counter
"High volume makes automation obvious"	High volume amplifies errors	Calculate: X% error × volume = Y problems/day
"Current human error rate is high"	Automation errors are different, often systematic	Human errors are random; automation errors repeat
"It's just data entry"	Data has meaning; wrong data has consequences	What regulatory implications does this data have?
"Add a human checkpoint"	Checkpoints fail under volume	Humans rubber-stamp at scale
"The stakeholder says it's simple"	Stakeholders underestimate complexity	Probe for hidden business logic
"We can always fix errors later"	Some errors can't be fixed	What's irreversible about this?
"AI accuracy is better than humans"	Aggregate accuracy hides distribution	Where does AI fail? Are those high-stakes?

Red Flags in Your Assessment

If you find yourself thinking these, STOP:

"This is a textbook case for automation"
"The stakeholder is right"
"Modern AI/ML can handle this"
"We just need validation rules"
"95% accuracy is good enough"
"Human-only with AI support" (this is just AI-Assisted with extra words)
"Bordering on not worth touching" (if you're saying this, it probably IS not worth touching)

These thoughts indicate optimism bias. Re-run the criteria scoring with more skepticism.

Special note on "bordering on": If your instinct says a step is borderline between two classifications, choose the more conservative one. "Bordering on NOT_WORTH_TOUCHING" means NOT_WORTH_TOUCHING.

When to Use "Not Worth Touching"

You MUST consider NOT_WORTH_TOUCHING when ANY of these apply:

Trigger	Why Automation Makes It Worse
Undocumented tribal knowledge	Automation will miss the workarounds that make it work
Recent regulatory inquiry/finding	Regulators are watching; automation failure = bad optics
Process compensates for system bugs	Automation encodes bugs instead of fixing them
High staff turnover in this role	Knowledge capture should precede automation
Pending regulatory changes	Automating to current rules creates rework
Single error caused major incident	Zero tolerance means automation risk is unacceptable

NOT_WORTH_TOUCHING is not failure. It's honest recognition that:

Some processes need fixing before automating
Some automation creates more risk than it solves
The right answer to "should we automate?" is sometimes "not yet" or "never"

If you score ANY criterion as HIGH and the current process works despite its pain points, strongly consider NOT_WORTH_TOUCHING.

The goal is not maximum automation. The goal is appropriate automation.

Financial Services Context

In regulated financial services, the consequences of automation errors include:

Regulatory fines and sanctions
Mandatory disclosure and remediation
Personal liability for executives
License risk

A single misclassified SAR trigger, unreported CTR, or misrouted fraud complaint can create regulatory exposure that dwarfs any efficiency gain.

Default to conservative classification. It's easier to upgrade a Human-Only to AI-Assisted after gaining confidence than to explain to regulators why your fully automated process failed.

automation-suitability-classifierSafety 100Repository

Package Files

Automation Suitability Classifier

Overview

Classification Levels

The Four Criteria

1. Error Tolerance

2. Regulatory Sensitivity

3. Explainability Requirement

4. Downstream Blast Radius

Classification Flowchart

Output Format

Probing Questions

Rationalizations to Reject

Red Flags in Your Assessment

When to Use "Not Worth Touching"

Financial Services Context

Install

AI Quality Score

Metadata

Tags

automation-suitability-classifierSafety 100Repository ShareFavorite skill

Package Files

Automation Suitability Classifier

Overview

Classification Levels

The Four Criteria

1. Error Tolerance

2. Regulatory Sensitivity

3. Explainability Requirement

4. Downstream Blast Radius

Classification Flowchart

Output Format

Probing Questions

Rationalizations to Reject

Red Flags in Your Assessment

When to Use "Not Worth Touching"

Financial Services Context

Install

AI Quality Score

Metadata

Tags

automation-suitability-classifierSafety 100Repository