R Targets Package Skill
This skill helps you effectively use the targets R package for building reproducible, scalable data analysis pipelines.
Core Concepts
What is targets?
targets is a Make-like pipeline tool for R that:
- Skips costly runtime for tasks already up to date
- Orchestrates computation with implicit parallel computing
- Abstracts files as R objects
- Tracks dependencies automatically through static code analysis
Key Files
_targets.R: The target script file that defines your pipeline. Must return a list of target objects._targets/: Data store containing:_targets/meta/meta: Target metadata (text file)_targets/objects/: Target output data_targets/workspaces/: Debug workspaces for errored targets
Quick Start
Basic Pipeline Structure
# _targets.R
library(targets)
library(tarchetypes)
tar_source() # Sources R/ directory
tar_option_set(packages = c("dplyr", "ggplot2"))
list(
tar_target(file, "data.csv", format = "file"),
tar_target(data, read_csv(file)),
tar_target(model, fit_model(data)),
tar_target(plot, create_plot(model, data))
)
Essential Commands
# Run the pipeline
tar_make()
# Check what would run
tar_outdated()
# Visualize dependencies
tar_visnetwork()
# List all targets with commands
tar_manifest()
# Read target results
tar_read(target_name)
tar_load(target_name) # Loads into environment
# Clean up
tar_destroy() # Remove entire _targets/ directory
tar_delete(target_name) # Delete specific targets
tar_invalidate(target_name) # Remove metadata only
Best Practices
Target Design
A good target should:
- Create a dataset, analyze a dataset, or summarize an analysis
- Be large enough to save meaningful time when skipped
- Be small enough that some targets can skip while others run
- Have no side effects (except file targets with
format = "file") - Return a single, meaningful, saveable value
Function-Oriented Workflows
Define functions in R/ directory, not inline in _targets.R:
# R/functions.R
get_data <- function(file) {
read_csv(file) %>%
filter(!is.na(value))
}
fit_model <- function(data) {
lm(outcome ~ predictor, data)
}
# _targets.R
library(targets)
tar_source()
list(
tar_target(data, get_data("data.csv")),
tar_target(model, fit_model(data))
)
Storage Formats
Choose appropriate formats for your data:
| Format | Best For | Requirements |
|---|---|---|
"rds" (default) | General R objects | base R |
"qs" | Large/general objects | qs2 package |
"feather" | Data frames | arrow package |
"parquet" | Large data frames | arrow package |
"file" | External files | Returns file path |
tar_option_set(format = "qs") # Global setting
# OR
tar_target(data, get_data(), format = "qs") # Per-target
Dynamic Branching
Dynamic branching creates targets at runtime based on data:
list(
tar_target(samples, c("A", "B", "C")),
tar_target(
analysis,
analyze_sample(samples),
pattern = map(samples) # Creates 3 branches
),
tar_target(
combined,
combine_results(analysis) # Auto-aggregates branches
)
)
Pattern Types
map(x, y): One branch per tuple of elementscross(x, y): One branch per combinationslice(x, index = c(1, 3)): Branch over specific indiceshead(x, n = 5): First n elementstail(x, n = 5): Last n elementssample(x, n = 5): Random sample
Iteration Modes
"vector"(default): Usesvctrs::vec_slice()andvctrs::vec_c()"list": Uses[[for slicing andlist()for aggregation"group": Branch overdplyr::group_by()row groups (use withtar_group())
Static Branching with tarchetypes
Static branching creates targets before the pipeline runs using metaprogramming:
library(tarchetypes)
values <- tibble(
method = rlang::syms(c("method1", "method2")),
dataset = c("data1", "data2")
)
tar_map(
values = values,
tar_target(analysis, method(dataset)),
tar_target(summary, summarize(analysis))
)
Debugging Workflow
Step 1: Check Error Details
tar_meta(fields = error, complete_only = TRUE)
Step 2: Reproduce Error Locally
tar_load_globals() # Load functions and packages
tar_load(target_name) # Load dependencies
# Run the errored function
Step 3: Interactive Debugging (if needed)
# Add browser() to your function
tar_make(callr_function = NULL, use_crew = FALSE)
See references/TROUBLESHOOTING.md for detailed error solutions.
Advanced Topics (See References)
- Troubleshooting: references/TROUBLESHOOTING.md - Solutions by error message
- Patterns: references/PATTERNS.md - Common workflow recipes
- Advanced Features: references/ADVANCED.md - Custom formats, CAS, metadata
- HPC Integration: references/HPC_INTEGRATION.md - Parallel computing with crew
- Package Development: references/PACKAGE_DEVELOPMENT.md - targets in R packages
- Function Reference: references/FUNCTION_CATEGORIES.md - Organized API reference
- Migration: references/MIGRATION.md - From drake to targets
Useful Utilities
# Check dependencies
tar_deps(your_function)
# Test branching patterns
tar_pattern(map(x, y), x = 3, y = 2)
# Get target metadata
tar_meta(targets_only = TRUE)
tar_meta(fields = c("name", "status", "time", "error"))
# Monitor progress
tar_poll() # Continuous refresh
tar_progress() # Current status
tar_watch() # Shiny app
# Validate pipeline
tar_validate() # Check for errors
tar_glimpse() # Brief summary
