askill
python-data-wrangling

python-data-wranglingSafety 70Repository

Modern data wrangling with pandas and polars. Use this skill when working with tabular data, need to choose between pandas/polars, or want to write idiomatic data manipulation code. Covers method chaining, idiomatic operations, performance considerations, and migration between libraries.

1 stars
1.2k downloads
Updated 2/2/2026

Package Files

Loading files...
SKILL.md

Python Data Wrangling

Modern patterns for pandas and polars data manipulation.

Decision Matrix: Pandas vs Polars

FactorPandasPolarsWinner
Data size<1GB>1GB, especially >10GBPolars for large data
Query optimizationNoYes (lazy evaluation)Polars
Ecosystem integrationVast (sklearn, viz)GrowingPandas for ML/viz
API familiarityDataFrame standardRust-inspiredPandas for teams
PerformanceGoodExcellent (2-10x)Polars
Memory usageHigherLowerPolars

General guidance:

  • Use pandas when: <1GB data, heavy ML/viz integration, team familiarity critical
  • Use polars when: >1GB data, performance critical, greenfield projects

Modern Pandas Patterns

Method Chaining

Chain operations for readability

result = (
    df
    .assign(
        total=lambda x: x["price"] * x["quantity"],
        date=lambda x: pd.to_datetime(x["date"])
    )
    .query("total > 100")
    .sort_values("total", ascending=False)
    .groupby("category")
    .agg({"total": ["sum", "mean"]})
    .reset_index()
)

See pandas-method-chaining.md for:

  • Lambda vs direct assignment
  • Pipe with custom functions
  • Handling complex transformations

Idiomatic Operations

# Use .loc for explicit indexing
df.loc[df["score"] > 80, "grade"] = "A"

# Use .pipe() for custom transformations
result = df.pipe(normalize_columns).pipe(remove_duplicates)

# Use .assign() for new columns
df = df.assign(
    log_value=lambda x: np.log(x["value"]),
    is_high=lambda x: x["value"] > x["value"].median()
)

GroupBy Patterns

# Named aggregations (pandas 0.25+)
summary = df.groupby("category").agg(
    total_sales=("sales", "sum"),
    avg_sales=("sales", "mean"),
    num_transactions=("sales", "count")
)

See pandas-groupby-patterns.md for:

  • Window functions
  • Multiple grouping levels
  • Custom aggregations

Polars Patterns

Lazy Evaluation

Use lazy API for query optimization

import polars as pl

result = (
    pl.scan_csv("data.csv")  # Lazy
    .filter(pl.col("value") > 100)
    .group_by("category")
    .agg([
        pl.col("sales").sum().alias("total_sales"),
        pl.col("sales").mean().alias("avg_sales")
    ])
    .collect()  # Execute
)

See polars-lazy-evaluation.md for:

  • When lazy helps vs hurts
  • Streaming for huge data
  • Query plan inspection

Polars Expressions

Use expressions for vectorized operations

result = df.select([
    pl.col("name"),
    (pl.col("salary") * 1.1).alias("new_salary"),
    pl.when(pl.col("age") > 30)
      .then(pl.lit("senior"))
      .otherwise(pl.lit("junior"))
      .alias("level")
])

See polars-expressions.md for:

  • Expression composition
  • when().then().otherwise() patterns
  • List and struct operations

Migration Guide

Pandas → Polars

PandasPolarsNotes
df["col"]df["col"] or pl.col("col")Expressions preferred
df[df["x"] > 5]df.filter(pl.col("x") > 5)Method-based
df.groupby("x").agg({"y": "sum"})df.group_by("x").agg(pl.col("y").sum())Expression-based

See migration-pandas-polars.md for complete patterns.

Performance Tips

Pandas Performance

# Use vectorized operations
df["result"] = df["a"] + df["b"]  # Good

# Use categorical for low-cardinality columns
df["category"] = df["category"].astype("category")

# Use eval for complex expressions
df.eval("total = price * quantity", inplace=True)

Polars Performance

# Use scan instead of read for lazy
df = pl.scan_csv("data.csv")

# Use streaming for data larger than memory
result = df.collect(streaming=True)

Anti-Patterns to Avoid

AvoidUse Instead
df.iterrows()Vectorized operations
Chained indexing df["a"]["b"] = x.loc
Growing DataFrames in loopspd.concat() outside loop
Mixed types in columnsConsistent types

source: pandas user guide, polars documentation

Install

Download ZIP
Requires askill CLI v1.0+

AI Quality Score

78/100Analyzed 4/5/2026

Comprehensive technical reference covering pandas and polars data wrangling. Includes decision matrix, method chaining, lazy evaluation, migration guide, and anti-patterns. Well-structured with clear tables and code examples. Scores high on reusability and clarity. Lacks step-by-step actionability and references external files. Good quality reference content but less actionable as standalone skill.

70
85
90
70
65

Metadata

Licenseunknown
Version-
Updated2/2/2026
Publisherjustanesta

Tags

api