Workflows¶

SO Campaign Manager supports several types of workflows for different analysis tasks.

Overview¶

Workflows are the fundamental units of computation in SO Campaign Manager. Each workflow:

Defines a specific analysis task
Specifies resource requirements
Includes environment configuration
Can have dependencies on other workflows

Available Workflows¶

ML Mapmaking¶

Maximum likelihood mapmaking creates maps from time-ordered data using iterative algorithms.

Purpose: Generate high-quality maps with proper noise modeling and systematics mitigation.

Configuration Example:

[campaign.ml-mapmaking]
context = "file:///path/to/context.yaml"
area = "file:///path/to/area.fits"
output_dir = "/path/to/output"
bands = "f090"
wafer = "ws0"
comps = "TQU"
maxiter = 10
query = "obs_id='1575600533.1575611468.ar5_1'"
tiled = 1
site = "act"

Key Parameters:

context: Context file defining data selection and processing parameters
area: FITS file defining the sky area to map
bands: Frequency bands to process (“f090”, “f150”, etc.)
wafer: Detector wafer identifier
comps: Map components (“T” for temperature only, “TQU” for temperature and polarization)
maxiter: Maximum number of iterations for convergence
query: SQL-like query for data selection
tiled: Whether to use tiled processing (0 or 1)

Resource Requirements:

Memory-intensive (typically 64-128 GB per process)
Can benefit from multiple cores for linear algebra operations
Disk I/O intensive for large datasets

SAT Simulation¶

Small Aperture Telescope (SAT) simulation workflows for generating synthetic observations using toast_so_sim.

Purpose: Create realistic simulated timestreams for validation and systematics studies.

Configuration Example:

[campaign.sat-sims]
output_dir = "/path/to/output"
schedule = "/path/to/schedule.txt"
bands = "SAT_f090"
wafer_slots = "w25"
sample_rate = 37
sim_noise = false
scan_map = false
sim_atmosphere = false
sim_sss = false
sim_hwpss = false

Key Parameters:

output_dir: Directory for simulation output
schedule: Observation schedule file
bands: Frequency band (e.g. SAT_f090, SAT_f150)
wafer_slots: Wafer slot identifier (e.g. w25)
sample_rate: Detector sample rate in Hz (default: 37)
sim_noise: Enable noise simulation (boolean)
scan_map: Enable map scanning (boolean)
sim_atmosphere: Enable atmosphere simulation (boolean)
sim_sss: Enable spin-synchronous signal simulation (boolean)
sim_hwpss: Enable HWP synchronous signal simulation (boolean)
pixels_healpix_radec_nside: HEALPix resolution (default: 512)

Power Spectra¶

Power spectrum estimation workflow using PSpipe.

Purpose: Compute power spectra from maps produced by the mapmaking pipeline.

Configuration Example:

[campaign.power-spectra]
subcommand = "/path/to/script.py"
script_args = ["/path/to/paramfile.dict"]
script_flags = ["simulate-syst", "simulate-lens"]

Key Parameters:

subcommand: Path to the PSpipe Python script to run
script_args: Positional arguments passed to the script (list)
script_flags: Boolean flags passed as --flag (list)

Resource Requirements:

Scales with the number of map products being cross-correlated
Some stages (e.g. mode-coupling matrix) are MPI-parallel and benefit from many ranks

ML Null Tests¶

Statistical tests to validate mapmaking results by creating maps from data splits.

Purpose: Detect systematic errors and validate noise models by checking that null maps (differences between splits) are consistent with noise.

All null tests share the following common parameters:

chunk_nobs: Number of observations per chunk used to define splits
context, area, output_dir, query: Same as ML Mapmaking

Types of Null Tests:

Mission Tests¶

Splits observations in time to test for time-dependent systematics.

[campaign.ml-null-tests.mission-tests]
chunk_nobs = 10
nsplits = 8

Observations are sorted by timestamp, grouped into chunks of chunk_nobs, and distributed across nsplits splits in a time-interleaved fashion.

Wafer Tests¶

Splits observations by detector wafer to test for detector-dependent systematics.

[campaign.ml-null-tests.wafer-tests]
chunk_nobs = 10
nsplits = 8

Observations are grouped by wafer slot and maps are produced per-wafer for comparison.

Direction Tests¶

Splits observations by scan direction (rising, setting, or middle azimuth) to test for scan-synchronous systematics. Always uses nsplits = 2.

[campaign.ml-null-tests.direction-tests]
chunk_nobs = 10

Observations are classified by azimuth center into rising (az < 180°), setting (az > 180°), or middle (az ≈ 180°) groups, and time-interleaved splits are created within each group.

PWV Tests¶

Splits observations by precipitable water vapour (PWV) level to test for atmosphere-dependent systematics.

[campaign.ml-null-tests.pwv-tests]
chunk_nobs = 10
nsplits = 2

Observations are ordered by PWV value and interleaved into splits, separating low-PWV from high-PWV conditions.

Day/Night Tests¶

Splits observations into daytime and nighttime subsets to test for solar-related systematics.

[campaign.ml-null-tests.day-night-tests]
chunk_nobs = 10
nsplits = 2

Observations are classified as day or night based on their timestamp and maps are produced separately for each condition.

Elevation Tests¶

Splits observations by telescope elevation to test for elevation-dependent systematics such as ground pickup or atmospheric gradients.

[campaign.ml-null-tests.elevation-tests]
chunk_nobs = 10
nsplits = 2

Observations are sorted by elevation center and distributed across splits.

Moon Rise/Set Tests¶

Splits observations by whether the Moon is rising or setting during the observation, to test for Moon-related contamination correlated with lunar phase angle.

[campaign.ml-null-tests.moonrise-set-tests]
chunk_nobs = 10
nsplits = 2

Moon Close Tests¶

Splits observations by proximity to the Moon to test for near-field Moon sidelobe contamination.

[campaign.ml-null-tests.moon-close-tests]
chunk_nobs = 10
nsplits = 2

Sun Close Tests¶

Splits observations by proximity to the Sun to test for near-field Sun sidelobe contamination.

[campaign.ml-null-tests.sun-close-tests]
chunk_nobs = 10
nsplits = 2

Creating Custom Workflows¶

To create a new workflow type:

Inherit from base Workflow class:

from socm.core.models import Workflow

class MyCustomWorkflow(Workflow):
    # Define additional parameters
    custom_param: str
    threshold: float = 0.5

Implement required methods:

def get_command(self, **kwargs) -> str:
    """Return the command to execute."""
    return f"{self.executable} {self.subcommand}"

def get_arguments(self, **kwargs) -> str:
    """Return command arguments."""
    return f"--param {self.custom_param} --threshold {self.threshold}"

Register the workflow:

from socm.workflows import registered_workflows

registered_workflows['my-custom'] = MyCustomWorkflow

Workflow Dependencies¶

Workflows can depend on outputs from other workflows. The campaign manager handles:

Dependency resolution - Ensures workflows run in the correct order
Resource optimization - Schedules dependent workflows as early as possible using HEFT

For TOML-based campaigns, subcampaigns provide a grouping mechanism. For explicit stage-by-stage dependency graphs, use the DAG YAML format:

stages:
  preprocess:
    executable: python -u
    script: preprocess.py
    depends: null
    resources:
      memory: 48G
      ranks: 1
      threads: 4
      runtime: 10m

  mapmaking:
    executable: python -u
    script: mapmaking.py
    depends:
      - preprocess
    resources:
      ranks: 14
      threads: 8
      memory: 128G
      runtime: 60m

  spectra:
    executable: python -u
    script: spectra.py
    depends:
      - mapmaking
    resources:
      ranks: 4
      threads: 4
      memory: 32G
      runtime: 20m

See the User Guide DAG section and examples/dag.yml for a full annotated example.

Best Practices¶

Resource Sizing¶

Memory: Allocate 20-50% more than estimated need
Runtime: Set conservative estimates to avoid queue timeouts
Cores: Balance between parallelization and memory per core

Data Management¶

Use fast local storage for temporary files
Ensure output directories have sufficient space
Clean up intermediate files when possible

Configuration¶

Use descriptive workflow names for tracking
Document custom parameters in configuration files
Test workflows on small datasets first

Monitoring¶

Check log files for workflow progress
Monitor resource usage to optimize future runs
Validate outputs before proceeding to dependent workflows

Troubleshooting¶

Common Issues¶

Memory Errors:

Increase memory allocation
Reduce data chunk size
Use tiled processing for large areas

Timeout Errors:

Increase runtime estimates
Check for hung processes
Optimize algorithm parameters

Dependency Errors:

Verify input file paths
Check workflow ordering
Ensure dependent outputs exist

Environment Issues:

Verify environment variables
Check module availability
Validate file permissions

Performance Tips¶

Use SSD storage for temporary files
Optimize number of MPI ranks vs threads
Consider memory bandwidth limitations
Profile workflows to identify bottlenecks

Workflows¶

Overview¶

Available Workflows¶

ML Mapmaking¶

SAT Simulation¶

Power Spectra¶

ML Null Tests¶

Mission Tests¶

Wafer Tests¶

Direction Tests¶

PWV Tests¶

Day/Night Tests¶

Elevation Tests¶

Moon Rise/Set Tests¶

Moon Close Tests¶

Sun Close Tests¶

Creating Custom Workflows¶

Workflow Dependencies¶

Best Practices¶

Resource Sizing¶

Data Management¶

Configuration¶

Monitoring¶

Troubleshooting¶

Common Issues¶

Performance Tips¶

SO Campaign Manager

Navigation

Related Topics