Skip to main content

How it works

mill schedule generates a job matrix (models × tasks × n_shots), filters out already-completed jobs from the output cache, then submits a SLURM array job. Each array worker evaluates one (model, task, n_shot) combination and writes results to the shared output_dir.

Cluster configuration

On first run, Mill copies its bundled clusters.yaml to ~/.cache/mill/clusters.yaml. Edit it to match your cluster:
clusters:
  my-cluster:
    hostname_pattern: "login*"   # matched against socket.gethostname()
    partition: gpu
    account: my_project
    gpu_type: A100               # informational label, included in the gres spec
    gpus_per_node: 1
    mem: "64G"
    queue_limit: 200             # cap on jobs queued at once (throttles the array)
    max_array_len: 100           # max simultaneous array tasks
There’s no walltime field — the SLURM time limit is computed automatically from the number of evals per job. Override the per-eval budget with --minutes_per_eval (raise it for heavy generative tasks like mmlu_pro):
mill --output_dir /scratch/results schedule \
  meta-llama/Meta-Llama-3-8B-Instruct mmlu_pro \
  --minutes_per_eval 240
Use --cache_dir to point Mill at a different location:
mill --cache_dir /shared/mill-config schedule ...

Basic usage

mill --output_dir /scratch/results schedule \
  meta-llama/Meta-Llama-3-8B-Instruct \
  mmlu \
  --n_shots 0,5
Arguments are positional: schedule <models> <tasks>. Passing the mmlu benchmark expands to its 57 subject tasks, so this sweep is 57 tasks × 2 n-shot values = 114 jobs.

Selecting the cluster

# Auto-detect by hostname (default)
mill --output_dir /scratch/results schedule meta-llama/Meta-Llama-3-8B-Instruct mmlu --cluster auto

# Explicitly name a cluster from clusters.yaml
mill --output_dir /scratch/results schedule meta-llama/Meta-Llama-3-8B-Instruct mmlu --cluster my-cluster

Dry run — preview without submitting

mill --output_dir /scratch/results schedule \
  meta-llama/Meta-Llama-3-8B-Instruct \
  mmlu,mmlu_pro \
  --n_shots 0,5 \
  --dry_run
Prints the full job table so you can verify the sweep before committing GPU hours.

Local sequential run

Skip SLURM and run all jobs in the current process — useful for debugging:
mill --output_dir ./results schedule \
  meta-llama/Meta-Llama-3-8B-Instruct \
  mmlu \
  --local

Virtual environments in SLURM jobs

mill --output_dir /scratch/results schedule \
  meta-llama/Meta-Llama-3-8B-Instruct mmlu \
  --venv_path /home/user/.venvs/mill
The SLURM worker activates this venv before running mill eval.

Custom task paths in SLURM workers

If your tasks live outside the Mill package, pass extra directories so the worker can discover them:
mill --output_dir /scratch/results schedule \
  meta-llama/Meta-Llama-3-8B-Instruct my_task \
  --extra_task_paths /home/user/my-tasks

Checking completion

After the job array finishes:
mill --output_dir /scratch/results collect \
  --models meta-llama/Meta-Llama-3-8B-Instruct \
  --tasks mmlu,mmlu_pro \
  --n_shots 0,5
The --check flag (default on) lists any missing (model, task, n_shot) combinations so you can resubmit stragglers.