User Guide

This comprehensive guide covers all aspects of using SO Campaign Manager.

Configuration System

SO Campaign Manager uses TOML configuration files to define campaigns. The configuration is structured in several sections:

Campaign Section

The main campaign section defines global settings:

[campaign]
deadline = "2d"  # Campaign deadline (supports: "1d", "2h", "30m", etc.)

[campaign.resources]
nodes = 4              # Number of compute nodes
cores-per-node = 112   # CPU cores per node

Workflow Sections

Each workflow type has its own section. Currently supported workflows:

  • ml-mapmaking - Maximum likelihood mapmaking

  • sat-sims - SAT observation simulation

  • power-spectra - Power spectrum estimation (PSpipe)

  • ml-null-tests.mission-tests - Time-split null tests

  • ml-null-tests.wafer-tests - Detector wafer split null tests

  • ml-null-tests.direction-tests - Scan direction split null tests

  • ml-null-tests.pwv-tests - Precipitable water vapour split null tests

  • ml-null-tests.day-night-tests - Day/night split null tests

  • ml-null-tests.moonrise-set-tests - Moon rise/set split null tests

  • ml-null-tests.elevation-tests - Telescope elevation split null tests

  • ml-null-tests.sun-close-tests - Sun proximity split null tests

  • ml-null-tests.moon-close-tests - Moon proximity split null tests

Example ML Mapmaking Configuration:

[campaign.ml-mapmaking]
context = "file:///path/to/context.yaml"
area = "file:///path/to/area.fits"
output_dir = "/path/to/output"
bands = "f090"
wafer = "ws0"
comps = "TQU"
maxiter = 10
query = "obs_id='1575600533.1575611468.ar5_1'"
tiled = 1
site = "act"

[campaign.ml-mapmaking.resources]
ranks = 1
threads = 32
memory = "80000"  # Memory in MB
runtime = "80000" # Runtime in seconds

[campaign.ml-mapmaking.environment]
MOBY2_TOD_STAGING_PATH = "/tmp/"
DOT_MOBY2 = "/path/to/act_dot_moby2"
SOTODLIB_SITECONFIG = "/path/to/site.yaml"

Resource Management

Computing Resources

Define the target computing resource:

resource = Resource(
    name="tiger3",                # Resource identifier
    nodes=4,                      # Available nodes
    cores_per_node=112,           # Cores per node
    memory_per_node=100000000,    # Memory per node (bytes)
    default_queue="tiger-test",   # SLURM queue
    maximum_walltime=3600000      # Max walltime (seconds)
)

Resource Allocation

Each workflow can specify its resource requirements:

  • ranks: Number of MPI ranks

  • threads: Number of OpenMP threads

  • memory: Memory requirement (MB)

  • runtime: Expected runtime (seconds)

The campaign manager automatically:

  1. Calculates total resource needs

  2. Optimizes job allocation

  3. Submits to appropriate SLURM queues

Workflow Types

ML Mapmaking Workflow

Maximum likelihood mapmaking for creating maps from time-ordered data.

Required Parameters:

  • context: Path to context file

  • area: Path to area definition file

  • output_dir: Output directory

  • bands: Frequency bands (e.g., “f090”, “f150”)

  • wafer: Wafer identifier

  • comps: Components to map (“T”, “TQU”, etc.)

  • query: Data selection query

Optional Parameters:

  • maxiter: Maximum iterations (default: 100)

  • tiled: Use tiled processing (0 or 1)

  • site: Observatory site

ML Null Tests Workflow

Statistical null tests for validating mapmaking results.

Mission Tests:

[campaign.ml-null-tests.mission-tests]
chunk_nobs = 10  # Chunk size in days
nsplits = 8      # Number of splits (must be multiple of 2)

Wafer Tests:

[campaign.ml-null-tests.wafer-tests]
chunk_nobs = 10  # Chunk size in days
nsplits = 8      # Number of splits

DAG-based Workflow Configuration

In addition to TOML-based campaigns, SO Campaign Manager supports defining workflows as a Directed Acyclic Graph (DAG) using a YAML file. This is suited for pipeline-style workflows where stages have explicit dependencies on one another.

DAG YAML Format

paramfile: &paramfile /path/to/paramfile.dict

campaign:
  deadline: 24h
  resource: tiger3
  execution_schema: remote
  requested_resources: 3359

stages:
  stage_one:
    executable: python -u
    script: /path/to/script_one.py
    script-args:
      - *paramfile
    depends: null
    resources:
      memory: 48G
      ranks: 1
      threads: 4
      runtime: 10m

  stage_two:
    executable: python -u
    script: /path/to/script_two.py
    script-args:
      - *paramfile
    depends:
      - stage_one
    resources:
      ranks: 14
      threads: 8
      memory: 8G
      runtime: 30m

  stage_three:
    executable: python -u
    script: /path/to/script_three.py
    script-kwargs:
      start: 0
      stop: 10
    script-flags:
      - simulate-syst
    depends:
      - stage_one
      - stage_two
    resources:
      ranks: 17
      memory: 600G
      threads: 4
      runtime: 15m

Stage Fields

  • executable: The interpreter or binary to run (e.g. python -u)

  • script: Path to the script to execute

  • script-args: Positional arguments passed to the script (list)

  • script-kwargs: Keyword arguments passed as --key=value flags (mapping)

  • script-flags: Boolean flags passed as --flag (list)

  • depends: List of stage names this stage depends on, or null for no dependencies

  • resources: Per-stage resource requirements (memory, ranks, threads, runtime)

Dependency Resolution

Stages with depends: null are independent and can run immediately. Stages that list other stages under depends will only be scheduled after all their dependencies have completed successfully. The planner constructs the full DAG and uses HEFT scheduling to determine the optimal execution order.

Running a DAG Campaign

socm -t campaign.yaml

An annotated example is available in the repository at examples/dag.yml.

Campaign Policies

Time Policy

The “time” policy minimizes total campaign completion time by:

  1. Analyzing workflow dependencies

  2. Optimizing parallel execution

  3. Balancing resource utilization

Currently, “time” is the only supported policy, but the architecture supports adding new policies.

Environment Variables

Common environment variables for SO workflows:

  • MOBY2_TOD_STAGING_PATH: Temporary storage path

  • DOT_MOBY2: Moby2 configuration directory

  • SOTODLIB_SITECONFIG: Site configuration file

Command Line Interface

Basic Usage

socm -t /path/to/campaign.toml

Options

  • -t, --toml: Path to configuration file (required)

The CLI automatically:

  1. Validates configuration

  2. Creates workflow instances

  3. Sets up resources

  4. Executes the campaign

Monitoring and Logging

The campaign manager provides detailed logging of:

  • Configuration validation

  • Workflow creation

  • Resource allocation

  • Job submission

  • Execution progress

  • Error handling

Logs are written to stdout and can be redirected as needed.

Best Practices

Configuration

  1. Use absolute paths for all file references

  2. Test configurations with small datasets first

  3. Set realistic deadlines based on data volume

  4. Monitor resource usage to optimize future runs

Resource Management

  1. Right-size resources - don’t over-allocate

  2. Consider queue limits when setting runtime

  3. Use appropriate memory estimates to avoid OOM errors

  4. Test on development queues before production runs

Troubleshooting

Common Issues

Configuration Errors:

Check TOML syntax and required parameters

Resource Allocation Failures:

Verify SLURM queue availability and limits

Workflow Execution Errors:

Check environment variables and file paths

Out of Memory Errors:

Increase memory allocation or reduce data chunk size

Getting Help

  • Check this documentation for configuration examples

  • Review example configurations in the examples/ directory

  • Examine log output for specific error messages

  • File issues on the GitHub repository for bugs or feature requests