Skip to content

ZTF / ALeRCE stamp benchmark

This site documents a vision–language benchmark built from ALeRCE (Zwicky Transient Facility alerts), aimed at structured prediction on real astronomical “stamps”: science, template, and difference imaging plus broker-style metadata.

Models receive a single RGB montage PNG per alert and return JSON with AstroAlertBench-style Parts A–C (metadata extraction, self-reported reasoning quality, and a three-stage → five-way classification cascade).

See example figures, heatmaps, and code blocks on the Visualizations page; GitHub / Hugging Face / Zooniverse links are on Resources.

Why it exists

Classifying variable and transient events from survey alerts is a core step in time-domain astronomy. This benchmark fixes a per-class pool of high-confidence examples, PNG montages suited to VLMs, and a deterministic scorer so different models are comparable on the same inputs and gold labels.

What you need locally

Large assets (FITS cutouts, optional PNG trees) are not always shipped with the git repository. The reproduction page lists how to download, build montages, run inference, and score JSONL outputs.

Released under the same terms as the accompanying paper repository.