DCT Generate - Create Synthetic Data

Generate realistic test data with customizable schemas and field types.

When to Use

Use this skill when you need to:

Create test datasets for development
Generate mock data for demos
Produce synthetic data for testing ETL pipelines
Create data with specific distributions
Generate data with referential integrity

Installation

which dct || go build -o dct && chmod +x ./dct

Usage

dct gen <schema> [flags]

Arguments

schema: JSON schema as a file path or inline JSON string

Flags

-n, --lines <number>: Number of rows to generate (default: 1)
-f, --format <format>: Output format - csv, ndjson (default: csv)
-o, --outfile <file>: Output file path (default: stdout)

Examples

From schema file:

dct gen schema.json -n 1000 -o test_data.csv

Inline schema:

dct gen '[{"field":"name","source":"firstNames"}]' -n 100

NDJSON output:

dct gen schema.json -n 500 -f ndjson -o output.ndjson

Generate to stdout:

dct gen users-schema.json -n 10

Schema Format

Array of field objects:

[
  {
    "field": "column_name",
    "source": "source_type",
    "config": { ... }
  }
]

Available Data Sources

Random Generators

randomBool - Boolean true/false

{"field": "active", "source": "randomBool"}

randomEnum - Random value from list

{"field": "status", "source": "randomEnum", "config": {"values": ["pending", "active", "inactive"]}}

randomAscii - Random ASCII string

{"field": "code", "source": "randomAscii", "config": {"length": 10}}

randomUniformInt - Uniform integer distribution

{"field": "age", "source": "randomUniformInt", "config": {"min": 18, "max": 65}}

randomNormal - Normal/Gaussian distribution

{"field": "score", "source": "randomNormal", "config": {"mean": 100, "std": 15}}

randomPoisson - Poisson distribution

{"field": "events", "source": "randomPoisson", "config": {"lambda": 5}}

randomDatetime - Random date/time

{"field": "created_at", "source": "randomDatetime", "config": {"min": "2024-01-01 00:00:00", "max": "2024-12-31 23:59:59", "tz": "UTC"}}

randomDate - Random date

{"field": "birth_date", "source": "randomDate", "config": {"min": "1980-01-01", "max": "2005-12-31"}}

randomTime - Random time

{"field": "meeting_time", "source": "randomTime", "config": {"min": "09:00:00", "max": "17:00:00"}}

Data Generators

uuid - UUID v4
```
{"field": "id", "source": "uuid"}
```

firstNames - Random first names

{"field": "first_name", "source": "firstNames"}

lastNames - Random last names

{"field": "last_name", "source": "lastNames"}

companies - Company names

{"field": "company", "source": "companies"}

emails - Email addresses
```
{"field": "email", "source": "emails"}
```

Derived Fields

Create computed fields using the Expr language:

{
  "field": "full_name",
  "source": "derived",
  "config": {
    "fields": ["first_name", "last_name"],
    "expression": "first_name + ' ' + last_name"
  }
}

Complex expressions:

{
  "field": "display_name",
  "source": "derived",
  "config": {
    "fields": ["first_name", "last_name", "company"],
    "expression": "first_name + ' ' + last_name + ' (' + company + ')'"
  }
}

Complete Schema Example

[
  {"field": "id", "source": "uuid"},
  {"field": "first_name", "source": "firstNames"},
  {"field": "last_name", "source": "lastNames"},
  {"field": "email", "source": "emails"},
  {"field": "age", "source": "randomUniformInt", "config": {"min": 18, "max": 65}},
  {"field": "department", "source": "randomEnum", "config": {"values": ["Engineering", "Sales", "Marketing", "HR"]}},
  {"field": "salary", "source": "randomNormal", "config": {"mean": 75000, "std": 15000}},
  {"field": "is_active", "source": "randomBool"},
  {
    "field": "full_name",
    "source": "derived",
    "config": {
      "fields": ["first_name", "last_name"],
      "expression": "first_name + ' ' + last_name"
    }
  }
]

Best Practices

Generate small samples first (n=10) to verify schema
Use derived fields to create realistic relationships
Use NDJSON format for nested/complex data
Save schemas to files for reuse
Use appropriate distributions for realistic data

Output Formats

CSV (default):

id,first_name,age
550e8400-e29b-41d4-a716-446655440000,John,34

NDJSON:

{"id":"550e8400-e29b-41d4-a716-446655440000","first_name":"John","age":34}
{"id":"550e8400-e29b-41d4-a716-446655440001","first_name":"Jane","age":28}

Related Skills

dct-peek: Verify generated data looks correct
dct-infer: Check schema of generated data
dct-diff: Compare generated data with production samples

dct-generateSafety 100Repository

Package Files

DCT Generate - Create Synthetic Data

When to Use

Installation

Usage

Arguments

Flags

Examples

Schema Format

Available Data Sources

Random Generators

Data Generators

Derived Fields

Complete Schema Example

Best Practices

Output Formats

Related Skills

Install

AI Quality Score

Metadata

Tags

dct-generateSafety 100Repository ShareFavorite skill

Package Files

DCT Generate - Create Synthetic Data

When to Use

Installation

Usage

Arguments

Flags

Examples

Schema Format

Available Data Sources

Random Generators

Data Generators

Derived Fields

Complete Schema Example

Best Practices

Output Formats

Related Skills

Install

AI Quality Score

Metadata

Tags

dct-generateSafety 100Repository