askill
dct-generate

dct-generateSafety 100Repository

Use this skill when the user wants to create synthetic test data, generate fake datasets, create mock data for testing, produce realistic data with specific patterns, or need sample data with custom schemas. Triggers include "generate test data", "create fake data", "mock dataset", "synthetic data", "generate sample records", "create test data", "fake users", "mock data", or when needing test data with specific fields and relationships.

0 stars
1.2k downloads
Updated 2/14/2026

Package Files

Loading files...
SKILL.md

DCT Generate - Create Synthetic Data

Generate realistic test data with customizable schemas and field types.

When to Use

Use this skill when you need to:

  • Create test datasets for development
  • Generate mock data for demos
  • Produce synthetic data for testing ETL pipelines
  • Create data with specific distributions
  • Generate data with referential integrity

Installation

which dct || go build -o dct && chmod +x ./dct

Usage

dct gen <schema> [flags]

Arguments

  • schema: JSON schema as a file path or inline JSON string

Flags

  • -n, --lines <number>: Number of rows to generate (default: 1)
  • -f, --format <format>: Output format - csv, ndjson (default: csv)
  • -o, --outfile <file>: Output file path (default: stdout)

Examples

From schema file:

dct gen schema.json -n 1000 -o test_data.csv

Inline schema:

dct gen '[{"field":"name","source":"firstNames"}]' -n 100

NDJSON output:

dct gen schema.json -n 500 -f ndjson -o output.ndjson

Generate to stdout:

dct gen users-schema.json -n 10

Schema Format

Array of field objects:

[
  {
    "field": "column_name",
    "source": "source_type",
    "config": { ... }
  }
]

Available Data Sources

Random Generators

  • randomBool - Boolean true/false

    {"field": "active", "source": "randomBool"}
    
  • randomEnum - Random value from list

    {"field": "status", "source": "randomEnum", "config": {"values": ["pending", "active", "inactive"]}}
    
  • randomAscii - Random ASCII string

    {"field": "code", "source": "randomAscii", "config": {"length": 10}}
    
  • randomUniformInt - Uniform integer distribution

    {"field": "age", "source": "randomUniformInt", "config": {"min": 18, "max": 65}}
    
  • randomNormal - Normal/Gaussian distribution

    {"field": "score", "source": "randomNormal", "config": {"mean": 100, "std": 15}}
    
  • randomPoisson - Poisson distribution

    {"field": "events", "source": "randomPoisson", "config": {"lambda": 5}}
    
  • randomDatetime - Random date/time

    {"field": "created_at", "source": "randomDatetime", "config": {"min": "2024-01-01 00:00:00", "max": "2024-12-31 23:59:59", "tz": "UTC"}}
    
  • randomDate - Random date

    {"field": "birth_date", "source": "randomDate", "config": {"min": "1980-01-01", "max": "2005-12-31"}}
    
  • randomTime - Random time

    {"field": "meeting_time", "source": "randomTime", "config": {"min": "09:00:00", "max": "17:00:00"}}
    

Data Generators

  • uuid - UUID v4

    {"field": "id", "source": "uuid"}
    
  • firstNames - Random first names

    {"field": "first_name", "source": "firstNames"}
    
  • lastNames - Random last names

    {"field": "last_name", "source": "lastNames"}
    
  • companies - Company names

    {"field": "company", "source": "companies"}
    
  • emails - Email addresses

    {"field": "email", "source": "emails"}
    

Derived Fields

Create computed fields using the Expr language:

{
  "field": "full_name",
  "source": "derived",
  "config": {
    "fields": ["first_name", "last_name"],
    "expression": "first_name + ' ' + last_name"
  }
}

Complex expressions:

{
  "field": "display_name",
  "source": "derived",
  "config": {
    "fields": ["first_name", "last_name", "company"],
    "expression": "first_name + ' ' + last_name + ' (' + company + ')'"
  }
}

Complete Schema Example

[
  {"field": "id", "source": "uuid"},
  {"field": "first_name", "source": "firstNames"},
  {"field": "last_name", "source": "lastNames"},
  {"field": "email", "source": "emails"},
  {"field": "age", "source": "randomUniformInt", "config": {"min": 18, "max": 65}},
  {"field": "department", "source": "randomEnum", "config": {"values": ["Engineering", "Sales", "Marketing", "HR"]}},
  {"field": "salary", "source": "randomNormal", "config": {"mean": 75000, "std": 15000}},
  {"field": "is_active", "source": "randomBool"},
  {
    "field": "full_name",
    "source": "derived",
    "config": {
      "fields": ["first_name", "last_name"],
      "expression": "first_name + ' ' + last_name"
    }
  }
]

Best Practices

  • Generate small samples first (n=10) to verify schema
  • Use derived fields to create realistic relationships
  • Use NDJSON format for nested/complex data
  • Save schemas to files for reuse
  • Use appropriate distributions for realistic data

Output Formats

CSV (default):

id,first_name,age
550e8400-e29b-41d4-a716-446655440000,John,34

NDJSON:

{"id":"550e8400-e29b-41d4-a716-446655440000","first_name":"John","age":34}
{"id":"550e8400-e29b-41d4-a716-446655440001","first_name":"Jane","age":28}

Related Skills

  • dct-peek: Verify generated data looks correct
  • dct-infer: Check schema of generated data
  • dct-diff: Compare generated data with production samples

Install

Download ZIP
Requires askill CLI v1.0+

AI Quality Score

95/100Analyzed 2/19/2026

Excellent SKILL.md for a synthetic data generation tool. Comprehensive documentation covering installation, usage, schema format, all available data sources (random generators, data generators, derived fields), examples, best practices, and related skills. Well-structured with clear When to Use section, multiple actionable examples, and proper tags. Slight reduction for tool-specific nature but still highly reusable for any test data generation needs.

100
95
92
95
92

Metadata

Licenseunknown
Version-
Updated2/14/2026
Publisherandrew-a-hale

Tags

ci-cdtesting