urbansound8k— zero-shot for CLAP-style audio-text encoders: each clip is scored against the 10 class names by audio-text similarity, using the standardThis is a sound of {c}.prompt.urbansound8k_generative— a generative rendering for audio-language models: the model hears the clip and the 10 candidate categories and answers with a category name, which is matched back to a class.
danavery/urbansound8K; benchmark by Salamon et al. (2014). UrbanSound8K ships as 10 folds for supervised cross-validation, but zero-shot needs no training split — the encoder isn’t trained, so all 8,732 clips are scored (matching how CLAP-family papers report US8K zero-shot).
Evaluation configuration
| Hyperparameter | Value |
|---|---|
| Benchmark | urbansound8k (auto-picks urbansound8k for CLAP, urbansound8k_generative for audio-LMs) |
| Dataset | danavery/urbansound8K (train, 8,732 clips, 10 classes) |
| n-shots | 0 |
| Task type | ZERO_SHOT_CLASSIFICATION (CLAP) / GENERATIVE_QA (audio-LM) |
| Prompt | This is a sound of {c}. (CLAP) / list-of-categories instruction (audio-LM) |
| Metric | acc / urbansound8k_gen_acc |
| Backend | CLAP (clap) / HuggingFace audio-LM (hf) |
Reproduce
Results
CLAP — clap-htsat-unfused
75.16% ± 0.46
zero-shot audio-text similarity
zero-shot audio-text similarity
Published UrbanSound8K zero-shot for LAION-CLAP is ≈76–77% (LAION-CLAP), reported for the fused
630k-audioset checkpoint with the same This is a sound of {c}. prompt. Mill evaluates the HuggingFace laion/clap-htsat-unfused checkpoint and measures 75.16%. As with ESC-50, the residual versus the paper is the documented accuracy drop of the HuggingFace-converted CLAP weights relative to the original laion_clap checkpoint (LAION-AI/CLAP #126), compounded by the paper’s use of the stronger fused checkpoint. All numbers are 10-way (chance = 10%).Per-model results
| Model | Rendering | Mill | Reference |
|---|---|---|---|
laion/clap-htsat-unfused | Zero-shot (acc) | 75.16% ± 0.46 | ≈76–77% (LAION-CLAP); gap is the fused-vs-unfused checkpoint + HF-weight-conversion drop (#126) |