# Mill > Unified multi-modal evaluation framework for text, image, video, and audio benchmarks. ## Docs - [Changelog](https://pymill.com/docs/changelog.md): New features, improvements, and fixes in Mill — newest first. - [Add a benchmark](https://pymill.com/docs/contributing/add-a-benchmark.md): Port a new benchmark into Mill — guided, validated, and documented. - [Add a model backend](https://pymill.com/docs/contributing/add-a-model.md): Wire a new model backend into Mill — pick the interface, register, document. - [Distributed Scheduling](https://pymill.com/docs/guides/distributed.md): Scale evaluations across a SLURM cluster with mill schedule. - [Text Evaluation](https://pymill.com/docs/guides/text-evaluation.md): Run text benchmarks with local HF models, vLLM, or API backends. - [Vision Evaluation](https://pymill.com/docs/guides/vision-evaluation.md): Evaluate multimodal models on image and video benchmarks. - [Installation](https://pymill.com/docs/installation.md): Install Mill and optional backend extras. - [Introduction](https://pymill.com/docs/introduction.md): Mill — a unified multi-modal evaluation framework for text, image, video, and audio benchmarks. - [Quickstart](https://pymill.com/docs/quickstart.md): Run your first evaluation in under five minutes. - [CLI Reference](https://pymill.com/docs/reference/cli.md): Complete flag documentation for the mill command. - [Models](https://pymill.com/docs/reference/models.md): Configure HuggingFace, vLLM, and API model backends. - [Output Types](https://pymill.com/docs/reference/output-types.md): The three OutputType values and how Mill queries the model for each. - [Tasks](https://pymill.com/docs/reference/tasks.md): Built-in benchmarks and how to write custom tasks. - [CIFAR-10](https://pymill.com/docs/reproducibility/cifar10.md): CIFAR-10 zero-shot image classification reproduced with Mill (CLIP and vision-language models). - [Clotho-AQA](https://pymill.com/docs/reproducibility/clotho_aqa.md): Clotho-AQA single-word audio question answering reproduced with Mill. - [ImageNet](https://pymill.com/docs/reproducibility/imagenet.md): ImageNet-1k reproduced with Mill — zero-shot image classification (CLIP) and generative MCQ (VLMs). - [MMLU](https://pymill.com/docs/reproducibility/mmlu.md): MMLU reproduced with Mill (Qwen3-0.6B-Base) versus the Qwen3 Technical Report. - [MMLU-Pro](https://pymill.com/docs/reproducibility/mmlu-pro.md): MMLU-Pro reproduced with Mill — generative chain-of-thought, 10-option multiple choice. - [MMMU-Pro](https://pymill.com/docs/reproducibility/mmmu_pro.md): MMMU-Pro (standard, 10 options) multimodal multiple-choice reproduced with Mill. - [Overview](https://pymill.com/docs/reproducibility/overview.md): How Mill reproduces published benchmark numbers, and a template for adding new ones.