askill
data-science

data-scienceSafety 90Repository

Data analysis, SQL queries, BigQuery operations, and data insights. Use for data analysis tasks and queries.

74 stars
1.5k downloads
Updated 2/27/2026

Package Files

Loading files...
SKILL.md

Data Science

Data analysis, SQL, and insights generation.

When to Use

  • Writing SQL queries
  • Data analysis and exploration
  • Creating visualizations
  • Statistical analysis
  • ETL and data pipelines

SQL Patterns

Common Queries

-- Aggregation with window functions
SELECT
    user_id,
    order_date,
    amount,
    SUM(amount) OVER (PARTITION BY user_id ORDER BY order_date) as running_total,
    ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY order_date DESC) as recency_rank
FROM orders;

-- CTEs for readability
WITH monthly_stats AS (
    SELECT
        DATE_TRUNC('month', created_at) as month,
        COUNT(*) as total_orders,
        SUM(amount) as revenue
    FROM orders
    GROUP BY 1
),
growth AS (
    SELECT
        month,
        revenue,
        LAG(revenue) OVER (ORDER BY month) as prev_revenue,
        (revenue - LAG(revenue) OVER (ORDER BY month)) / NULLIF(LAG(revenue) OVER (ORDER BY month), 0) as growth_rate
    FROM monthly_stats
)
SELECT * FROM growth;

BigQuery Specifics

-- Partitioned table query
SELECT *
FROM `project.dataset.events`
WHERE DATE(_PARTITIONTIME) BETWEEN '2024-01-01' AND '2024-01-31';

-- UNNEST for arrays
SELECT
    user_id,
    item
FROM `project.dataset.orders`,
UNNEST(items) as item;

-- Approximate counts for large data
SELECT APPROX_COUNT_DISTINCT(user_id) as unique_users
FROM `project.dataset.events`;

Python Analysis

import pandas as pd
import numpy as np

# Load and explore
df = pd.read_csv('data.csv')
df.info()
df.describe()

# Clean and transform
df['date'] = pd.to_datetime(df['date'])
df = df.dropna(subset=['required_field'])
df['category'] = df['category'].fillna('Unknown')

# Aggregate
summary = df.groupby('category').agg({
    'value': ['mean', 'sum', 'count'],
    'date': ['min', 'max']
}).round(2)

# Visualize
import matplotlib.pyplot as plt
df.groupby('date')['value'].sum().plot(figsize=(12, 6))
plt.title('Daily Values')
plt.savefig('chart.png', dpi=150, bbox_inches='tight')

Statistical Analysis

from scipy import stats

# Hypothesis testing
t_stat, p_value = stats.ttest_ind(group_a, group_b)

# Correlation
correlation = df['x'].corr(df['y'])

# Regression
from sklearn.linear_model import LinearRegression
model = LinearRegression().fit(X, y)
print(f"R² = {model.score(X, y):.3f}")

Output Format

## Analysis Summary

**Question:** [What we're trying to answer]
**Data Source:** [Tables/files used]
**Date Range:** [Time period]

### Key Findings

1. [Finding with supporting metric]
2. [Finding with supporting metric]

### Visualization

[Chart description or embedded image]

### Recommendations

- [Actionable insight]

Examples

Input: "Analyze user retention" Action: Query cohort data, calculate retention rates, visualize trends

Input: "Find top customers" Action: Write SQL for RFM analysis, segment users, summarize findings

Install

Download ZIP
Requires askill CLI v1.0+

AI Quality Score

79/100Analyzed 3/1/2026

Well-structured data science skill with comprehensive SQL patterns (including BigQuery specifics), Python analysis code, and statistical methods. Includes clear When-to-Use section, good code examples, and reusable patterns. Slightly held back by missing environment setup details and deep path nesting. High practical value for data analysis tasks.

90
82
85
70
78

Metadata

Licenseunknown
Version-
Updated2/27/2026
Publisherhtlin222

Tags

databaseobservability