Launched LLM Evaluation Platform

Shipped an end-to-end evaluation pipeline for LLM features:

  • automatic metrics + human ratings
  • safety checks & dashboards
  • CI hooks and daily trend reporting