IterableData Development
Quick Setup
pip install -e ".[dev]"
pytest --verbose
ruff check iterable tests
ruff format iterable tests
Code Style
- Python 3.10+ with type hints where appropriate
- Max line length: 120 characters
- Use
rufffor linting and formatting - Double quotes for strings consistently
- Always use context managers for file operations
- Import order: standard library, third-party, local imports
Project Structure
iterable/helpers/- Utility functions (detect, schema, utils)iterable/datatypes/- Format-specific implementationsiterable/codecs/- Compression codec implementationsiterable/engines/- Processing engines (DuckDB, internal)iterable/convert/- Format conversion utilitiesiterable/pipeline/- Data pipeline processingtests/- Test suite (one test file per format/feature)
Import Patterns
- Main entry:
from iterable.helpers.detect import open_iterable - Format-specific:
from iterable.datatypes.csv import CSVIterable - Codecs:
from iterable.codecs.gzipcodec import GZIPCodec - Always use
open_iterable()for user-facing code
File Handling
- Always use context managers:
with open_iterable('file.csv') as source: - Never call
.close()when usingwithstatements - Reset iterators with
.reset()method when needed - Handle compression automatically via filename detection
Error Handling
- Format detection failures: provide helpful error messages
- Missing optional dependencies: raise clear ImportError with installation instructions
- Invalid file formats: raise appropriate exceptions (ValueError, TypeError)
- Always handle file I/O errors gracefully
Code Conventions
- Use
open_iterable()for automatic format detection - Prefer bulk operations (
read_bulk,write_bulk) for performance - Use DuckDB engine when appropriate (CSV, JSONL files)
- Handle encoding automatically via
chardetor user specification
Pre-Commit Checks
Before committing:
- Run tests:
pytest --verbose - Run linter:
ruff check iterable tests - Format code:
ruff format iterable tests - Type check:
mypy iterable(warnings allowed)
Quality Tools
- Security:
bandit -r iterable -ll - Dead code:
vulture iterable --min-confidence 80 - Complexity:
radon cc iterable --min B - Coverage:
pytest --cov=iterable --cov-report=html
Known Constraints
- DuckDB engine supports: CSV, JSONL, JSON formats and GZIP, ZStandard codecs
- Large files: use streaming (iterator interface) to avoid memory issues
- XML parsing: requires
iterableargs={'tagname': 'item'} - Some formats require optional dependencies (see
pyproject.toml)
