Workflows¶
SO Campaign Manager supports several types of workflows for different analysis tasks.
Overview¶
Workflows are the fundamental units of computation in SO Campaign Manager. Each workflow:
Defines a specific analysis task
Specifies resource requirements
Includes environment configuration
Can have dependencies on other workflows
Available Workflows¶
ML Mapmaking¶
Maximum likelihood mapmaking creates maps from time-ordered data using iterative algorithms.
Purpose: Generate high-quality maps with proper noise modeling and systematics mitigation.
Configuration Example:
[campaign.ml-mapmaking]
context = "file:///path/to/context.yaml"
area = "file:///path/to/area.fits"
output_dir = "/path/to/output"
bands = "f090"
wafer = "ws0"
comps = "TQU"
maxiter = 10
query = "obs_id='1575600533.1575611468.ar5_1'"
tiled = 1
site = "act"
Key Parameters:
context: Context file defining data selection and processing parametersarea: FITS file defining the sky area to mapbands: Frequency bands to process (“f090”, “f150”, etc.)wafer: Detector wafer identifiercomps: Map components (“T” for temperature only, “TQU” for temperature and polarization)maxiter: Maximum number of iterations for convergencequery: SQL-like query for data selectiontiled: Whether to use tiled processing (0 or 1)
Resource Requirements:
Memory-intensive (typically 64-128 GB per process)
Can benefit from multiple cores for linear algebra operations
Disk I/O intensive for large datasets
SAT Simulation¶
Small Aperture Telescope (SAT) simulation workflows for generating synthetic observations
using toast_so_sim.
Purpose: Create realistic simulated timestreams for validation and systematics studies.
Configuration Example:
[campaign.sat-sims]
output_dir = "/path/to/output"
schedule = "/path/to/schedule.txt"
bands = "SAT_f090"
wafer_slots = "w25"
sample_rate = 37
sim_noise = false
scan_map = false
sim_atmosphere = false
sim_sss = false
sim_hwpss = false
Key Parameters:
output_dir: Directory for simulation outputschedule: Observation schedule filebands: Frequency band (e.g.SAT_f090,SAT_f150)wafer_slots: Wafer slot identifier (e.g.w25)sample_rate: Detector sample rate in Hz (default: 37)sim_noise: Enable noise simulation (boolean)scan_map: Enable map scanning (boolean)sim_atmosphere: Enable atmosphere simulation (boolean)sim_sss: Enable spin-synchronous signal simulation (boolean)sim_hwpss: Enable HWP synchronous signal simulation (boolean)pixels_healpix_radec_nside: HEALPix resolution (default: 512)
Power Spectra¶
Power spectrum estimation workflow using PSpipe.
Purpose: Compute power spectra from maps produced by the mapmaking pipeline.
Configuration Example:
[campaign.power-spectra]
subcommand = "/path/to/script.py"
script_args = ["/path/to/paramfile.dict"]
script_flags = ["simulate-syst", "simulate-lens"]
Key Parameters:
subcommand: Path to the PSpipe Python script to runscript_args: Positional arguments passed to the script (list)script_flags: Boolean flags passed as--flag(list)
Resource Requirements:
Scales with the number of map products being cross-correlated
Some stages (e.g. mode-coupling matrix) are MPI-parallel and benefit from many ranks
ML Null Tests¶
Statistical tests to validate mapmaking results by creating maps from data splits.
Purpose: Detect systematic errors and validate noise models by checking that null maps (differences between splits) are consistent with noise.
All null tests share the following common parameters:
chunk_nobs: Number of observations per chunk used to define splitscontext,area,output_dir,query: Same as ML Mapmaking
Types of Null Tests:
Mission Tests¶
Splits observations in time to test for time-dependent systematics.
[campaign.ml-null-tests.mission-tests]
chunk_nobs = 10
nsplits = 8
Observations are sorted by timestamp, grouped into chunks of chunk_nobs, and
distributed across nsplits splits in a time-interleaved fashion.
Wafer Tests¶
Splits observations by detector wafer to test for detector-dependent systematics.
[campaign.ml-null-tests.wafer-tests]
chunk_nobs = 10
nsplits = 8
Observations are grouped by wafer slot and maps are produced per-wafer for comparison.
Direction Tests¶
Splits observations by scan direction (rising, setting, or middle azimuth) to test for
scan-synchronous systematics. Always uses nsplits = 2.
[campaign.ml-null-tests.direction-tests]
chunk_nobs = 10
Observations are classified by azimuth center into rising (az < 180°), setting (az > 180°), or middle (az ≈ 180°) groups, and time-interleaved splits are created within each group.
PWV Tests¶
Splits observations by precipitable water vapour (PWV) level to test for atmosphere-dependent systematics.
[campaign.ml-null-tests.pwv-tests]
chunk_nobs = 10
nsplits = 2
Observations are ordered by PWV value and interleaved into splits, separating low-PWV from high-PWV conditions.
Day/Night Tests¶
Splits observations into daytime and nighttime subsets to test for solar-related systematics.
[campaign.ml-null-tests.day-night-tests]
chunk_nobs = 10
nsplits = 2
Observations are classified as day or night based on their timestamp and maps are produced separately for each condition.
Elevation Tests¶
Splits observations by telescope elevation to test for elevation-dependent systematics such as ground pickup or atmospheric gradients.
[campaign.ml-null-tests.elevation-tests]
chunk_nobs = 10
nsplits = 2
Observations are sorted by elevation center and distributed across splits.
Moon Rise/Set Tests¶
Splits observations by whether the Moon is rising or setting during the observation, to test for Moon-related contamination correlated with lunar phase angle.
[campaign.ml-null-tests.moonrise-set-tests]
chunk_nobs = 10
nsplits = 2
Moon Close Tests¶
Splits observations by proximity to the Moon to test for near-field Moon sidelobe contamination.
[campaign.ml-null-tests.moon-close-tests]
chunk_nobs = 10
nsplits = 2
Sun Close Tests¶
Splits observations by proximity to the Sun to test for near-field Sun sidelobe contamination.
[campaign.ml-null-tests.sun-close-tests]
chunk_nobs = 10
nsplits = 2
Creating Custom Workflows¶
To create a new workflow type:
Inherit from base Workflow class:
from socm.core.models import Workflow
class MyCustomWorkflow(Workflow):
# Define additional parameters
custom_param: str
threshold: float = 0.5
Implement required methods:
def get_command(self, **kwargs) -> str:
"""Return the command to execute."""
return f"{self.executable} {self.subcommand}"
def get_arguments(self, **kwargs) -> str:
"""Return command arguments."""
return f"--param {self.custom_param} --threshold {self.threshold}"
Register the workflow:
from socm.workflows import registered_workflows
registered_workflows['my-custom'] = MyCustomWorkflow
Workflow Dependencies¶
Workflows can depend on outputs from other workflows. The campaign manager handles:
Dependency resolution - Ensures workflows run in the correct order
Resource optimization - Schedules dependent workflows as early as possible using HEFT
For TOML-based campaigns, subcampaigns provide a grouping mechanism. For explicit stage-by-stage dependency graphs, use the DAG YAML format:
stages:
preprocess:
executable: python -u
script: preprocess.py
depends: null
resources:
memory: 48G
ranks: 1
threads: 4
runtime: 10m
mapmaking:
executable: python -u
script: mapmaking.py
depends:
- preprocess
resources:
ranks: 14
threads: 8
memory: 128G
runtime: 60m
spectra:
executable: python -u
script: spectra.py
depends:
- mapmaking
resources:
ranks: 4
threads: 4
memory: 32G
runtime: 20m
See the User Guide DAG section and examples/dag.yml for a full annotated example.
Best Practices¶
Resource Sizing¶
Memory: Allocate 20-50% more than estimated need
Runtime: Set conservative estimates to avoid queue timeouts
Cores: Balance between parallelization and memory per core
Data Management¶
Use fast local storage for temporary files
Ensure output directories have sufficient space
Clean up intermediate files when possible
Configuration¶
Use descriptive workflow names for tracking
Document custom parameters in configuration files
Test workflows on small datasets first
Monitoring¶
Check log files for workflow progress
Monitor resource usage to optimize future runs
Validate outputs before proceeding to dependent workflows
Troubleshooting¶
Common Issues¶
- Memory Errors:
Increase memory allocation
Reduce data chunk size
Use tiled processing for large areas
- Timeout Errors:
Increase runtime estimates
Check for hung processes
Optimize algorithm parameters
- Dependency Errors:
Verify input file paths
Check workflow ordering
Ensure dependent outputs exist
- Environment Issues:
Verify environment variables
Check module availability
Validate file permissions
Performance Tips¶
Use SSD storage for temporary files
Optimize number of MPI ranks vs threads
Consider memory bandwidth limitations
Profile workflows to identify bottlenecks