Data Classification

Purpose

Classify data elements by sensitivity tier, define handling requirements for each tier, and map PII flows to ensure appropriate protections are applied throughout the data lifecycle.

Scope Constraints

Reads data models, schemas, and architecture documentation for classification analysis. Does not modify schemas, apply encryption, or access production data stores directly.

Inputs

Data model or schema being analyzed
Data elements and their sources (user input, system generated, third-party)
Storage and processing architecture
Integration points and data sharing arrangements
Applicable regulatory context (from compliance review if available)

Input Sanitization

No user-provided values are used in commands or file paths. All inputs are treated as read-only analysis targets.

Procedure

Progress Checklist

Step 1: Inventory data elements
Step 2: Classify by sensitivity tier
Step 3: Map PII flows
Step 4: Define handling requirements per tier
Step 5: Identify cross-boundary data transfers
Step 6: Specify encryption and masking requirements

Step 1: Inventory Data Elements

Catalog every data element in scope. For each element, document:

Name and description: What is the field and what does it represent?
Source: User-provided, system-generated, derived, or third-party
Format: Free text, structured (email, phone), numeric, binary, etc.
Volume: Approximate record count and growth rate
Current protections: Encryption at rest/transit, access controls, masking

Step 2: Classify by Sensitivity Tier

Assign each data element to a sensitivity tier:

Public: Information intended for public access. No PII. No competitive sensitivity. Example: marketing copy, public API docs, open-source code.
Internal: Not public, but low impact if disclosed. Non-identifying operational data. Example: internal project names, non-sensitive configs, aggregated metrics.
Confidential: Business-sensitive or indirect PII. Disclosure causes material harm. Example: email addresses, IP addresses, financial reports, API keys, internal strategies.
Restricted: Direct PII, protected health information, payment data, credentials. Disclosure triggers regulatory obligations. Example: SSN, medical records, credit card numbers, passwords, biometric data.

Step 3: Map PII Flows

For each PII element, trace the complete data flow:

Collection: Where does it enter the system? What consent covers it?
Transit: How does it move between services? Is it encrypted in transit?
Processing: Which services touch it? Is the access minimized to what's necessary?
Storage: Where is it persisted? Is it encrypted at rest? Is it in the right geography?
Sharing: Does it leave the system boundary? Under what contractual protections?
Deletion: How is it removed? Does deletion cascade to all copies?

Step 4: Define Handling Requirements Per Tier

Specify the minimum controls required for each sensitivity tier:

Access control: Who can read/write/delete? What authentication is required?
Encryption: At rest, in transit, application-level? Key management approach?
Masking/redaction: In logs, error messages, API responses, UI displays?
Retention: Maximum retention period, automated expiry mechanism?
Audit: What access events must be logged? What level of detail?
Backup/recovery: Backup frequency, encryption of backups, geographic constraints?

Step 5: Identify Cross-Boundary Data Transfers

Document every case where data moves across trust boundaries:

Internal service to external API (analytics, payment processors, email services)
Production to non-production environments (test data, staging copies)
Geographic transfers (EU to US, cross-region replication)
Employee access (admin tools, support dashboards, database queries)
For each transfer: document the justification, contractual protections (DPA, SCCs), and technical safeguards

Step 6: Specify Encryption and Masking Requirements

Define specific technical controls for each data element:

Encryption at rest: AES-256 for Restricted, AES-128 minimum for Confidential
Encryption in transit: TLS 1.2+ for all tiers, mTLS for Restricted inter-service
Application-level encryption: Field-level encryption for Restricted elements in shared databases
Log masking: Restricted fields never appear in logs; Confidential fields are masked/truncated
Display masking: SSN shows last 4 only, credit cards show last 4 only, emails partially masked in admin views
Test data: Restricted and Confidential data must be synthesized or anonymized for non-production

Compaction resilience: If context was lost during a long session, re-read the Inputs section to reconstruct what system is being analyzed, check the Progress Checklist for completed steps, then resume from the earliest incomplete step.

Output Format