Nixtla Schema Mapper
Transform data sources to Nixtla-compatible schema (unique_id, ds, y).
Overview
This skill automates data transformation:
- Column inference: Detects timestamp, target, and ID columns
- Code generation: Python modules for CSV/SQL/Parquet/dbt
- Schema contracts: Documentation with validation rules
- Quality checks: Validates transformed data
Prerequisites
Required:
- Python 3.8+
pandas
Optional:
pyarrow: For Parquet supportsqlalchemy: For SQL sourcesdbt-core: For dbt models
Installation:
pip install pandas pyarrow sqlalchemy
Instructions
Step 1: Identify Data Source
Supported formats:
- CSV/Parquet files
- SQL tables or queries
- dbt models
Step 2: Analyze Schema
python {baseDir}/scripts/analyze_schema.py --input data/sales.csv
Output:
Detected columns:
Timestamp: 'date' (datetime64)
Target: 'sales' (float64)
Series ID: 'store_id' (object)
Exogenous: price, promotion
Step 3: Generate Transformation
python {baseDir}/scripts/generate_transform.py \
--input data/sales.csv \
--id_col store_id \
--date_col date \
--target_col sales \
--output data/transform/to_nixtla_schema.py
Step 4: Create Schema Contract
python {baseDir}/scripts/create_contract.py \
--mapping mapping.json \
--output NIXTLA_SCHEMA_CONTRACT.md
Step 5: Validate Transformation
python data/transform/to_nixtla_schema.py
Output
- data/transform/to_nixtla_schema.py: Transformation module
- NIXTLA_SCHEMA_CONTRACT.md: Schema documentation
- nixtla_data.csv: Transformed data (optional)
Error Handling
-
Error:
No timestamp column detectedSolution: Specify manually with--date_col -
Error:
Multiple target candidatesSolution: Specify manually with--target_col -
Error:
Date parsing failedSolution: Specify format with--date_format "%Y-%m-%d" -
Error:
Non-numeric target columnSolution: Check for string values, usepd.to_numeric(errors='coerce')
Examples
Example 1: CSV Transformation
python {baseDir}/scripts/generate_transform.py \
--input sales.csv \
--id_col product_id \
--date_col timestamp \
--target_col revenue
Generated code:
def to_nixtla_schema(path="sales.csv"):
df = pd.read_csv(path)
df = df.rename(columns={
'product_id': 'unique_id',
'timestamp': 'ds',
'revenue': 'y'
})
df['ds'] = pd.to_datetime(df['ds'])
return df[['unique_id', 'ds', 'y']]
Example 2: SQL Source
python {baseDir}/scripts/generate_transform.py \
--sql "SELECT * FROM daily_sales" \
--connection postgresql://localhost/db \
--id_col store_id \
--date_col sale_date \
--target_col amount
Resources
- Scripts:
{baseDir}/scripts/ - Templates:
{baseDir}/assets/templates/ - Nixtla Schema Docs: https://nixtla.github.io/statsforecast/
Related Skills:
nixtla-timegpt-lab: Use transformed data for forecastingnixtla-experiment-architect: Reference in experiments
