Quickstart - Mill

Prerequisites

Python 3.10+
A GPU (for local HF/vLLM models) or an API key (for LiteLLM backends)

1. Install

Install straight from GitHub — no clone required:

pip install "mill-eval @ git+https://github.com/haideraltahan/Mill.git"

2. Run a text evaluation

mill eval "meta-llama/Meta-Llama-3-8B-Instruct[dtype=bfloat16,batch_size=8]" mmlu \
  --output_dir ./results

Mill streams progress to your terminal and writes a Feather file to ./results/ when done.

3. View results

mill --output_dir ./results collect --metric acc

The collect command renders a table of scores in your terminal. Pass --metric to choose which metric to show — MMLU reports acc:

Mill results — performance (acc)
┌─────────────────────────────────────┬────────┐
│ model                               │ mmlu   │
├─────────────────────────────────────┼────────┤
│ meta-llama/Meta-Llama-3-8B-Instruct │ 0.6398 │
└─────────────────────────────────────┴────────┘

4. Browse available tasks

mill ls

This opens a full-screen TUI browser. Use ↑ ↓ to navigate, Tab to switch between Benchmarks and Tasks, and Enter to copy a task name to your clipboard.

Next steps

Text evaluation guide

Few-shot, custom metrics, and n-shot sweeps.

Vision evaluation guide

Multimodal models with image/video inputs.

Distributed scheduling

Scale across a SLURM cluster.

CLI reference

Full flag documentation.

​Prerequisites

​1. Install

​2. Run a text evaluation

​3. View results

​4. Browse available tasks

​Next steps