返回 Skill 列表
extension
分类: 开发与工程无需 API Key

Ml Model Eval Benchmark

利用加权指标和确定性排名输出比较候选模型,适用于基准排行榜和模型推广决策。

person作者: 0x-professorhubclawhub

ML Model Eval Benchmark

Overview

Produce consistent model ranking outputs from metric-weighted evaluation inputs.

Workflow

  1. Define metric weights and accepted metric ranges.
  2. Ingest model metrics for each candidate.
  3. Compute weighted score and ranking.
  4. Export leaderboard and promotion recommendation.

Use Bundled Resources

  • Run scripts/benchmark_models.py to generate benchmark outputs.
  • Read references/benchmarking-guide.md for weighting and tie-break guidance.

Guardrails

  • Keep metric names and scales consistent across candidates.
  • Record weighting assumptions in the output.