> ## Documentation Index
> Fetch the complete documentation index at: https://pymill.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Distributed Scheduling

> Scale evaluations across a SLURM cluster with mill schedule.

## How it works

`mill schedule` generates a job matrix (models × tasks × n\_shots), filters out already-completed jobs from the output cache, then submits a SLURM array job. Each array worker evaluates one `(model, task, n_shot)` combination and writes results to the shared `output_dir`.

## Cluster configuration

On first run, Mill copies its bundled `clusters.yaml` to `~/.cache/mill/clusters.yaml`. Edit it to match your cluster:

```yaml theme={null}
clusters:
  my-cluster:
    hostname_pattern: "login*"   # matched against socket.gethostname()
    partition: gpu
    account: my_project
    gpu_type: A100               # informational label, included in the gres spec
    gpus_per_node: 1
    mem: "64G"
    queue_limit: 200             # cap on jobs queued at once (throttles the array)
    max_array_len: 100           # max simultaneous array tasks
```

There's no walltime field — the SLURM time limit is computed automatically from the
number of evals per job. Override the per-eval budget with `--minutes_per_eval`
(raise it for heavy generative tasks like `mmlu_pro`):

```bash theme={null}
mill --output_dir /scratch/results schedule \
  meta-llama/Meta-Llama-3-8B-Instruct mmlu_pro \
  --minutes_per_eval 240
```

Use `--cache_dir` to point Mill at a different location:

```bash theme={null}
mill --cache_dir /shared/mill-config schedule ...
```

## Basic usage

```bash theme={null}
mill --output_dir /scratch/results schedule \
  meta-llama/Meta-Llama-3-8B-Instruct \
  mmlu \
  --n_shots 0,5
```

Arguments are positional: `schedule <models> <tasks>`. Passing the `mmlu` benchmark
expands to its 57 subject tasks, so this sweep is 57 tasks × 2 n-shot values = 114 jobs.

## Selecting the cluster

```bash theme={null}
# Auto-detect by hostname (default)
mill --output_dir /scratch/results schedule meta-llama/Meta-Llama-3-8B-Instruct mmlu --cluster auto

# Explicitly name a cluster from clusters.yaml
mill --output_dir /scratch/results schedule meta-llama/Meta-Llama-3-8B-Instruct mmlu --cluster my-cluster
```

## Dry run — preview without submitting

```bash theme={null}
mill --output_dir /scratch/results schedule \
  meta-llama/Meta-Llama-3-8B-Instruct \
  mmlu,mmlu_pro \
  --n_shots 0,5 \
  --dry_run
```

Prints the full job table so you can verify the sweep before committing GPU hours.

## Local sequential run

Skip SLURM and run all jobs in the current process — useful for debugging:

```bash theme={null}
mill --output_dir ./results schedule \
  meta-llama/Meta-Llama-3-8B-Instruct \
  mmlu \
  --local
```

## Virtual environments in SLURM jobs

```bash theme={null}
mill --output_dir /scratch/results schedule \
  meta-llama/Meta-Llama-3-8B-Instruct mmlu \
  --venv_path /home/user/.venvs/mill
```

The SLURM worker activates this venv before running `mill eval`.

## Custom task paths in SLURM workers

If your tasks live outside the Mill package, pass extra directories so the worker can discover them:

```bash theme={null}
mill --output_dir /scratch/results schedule \
  meta-llama/Meta-Llama-3-8B-Instruct my_task \
  --extra_task_paths /home/user/my-tasks
```

## Checking completion

After the job array finishes:

```bash theme={null}
mill --output_dir /scratch/results collect \
  --models meta-llama/Meta-Llama-3-8B-Instruct \
  --tasks mmlu,mmlu_pro \
  --n_shots 0,5
```

The `--check` flag (default on) lists any missing `(model, task, n_shot)` combinations so you can resubmit stragglers.
