Evaluation Results
| DATE | MODEL | EVAL | JUDGE | PASS | FAIL | NEEDS REVIEW | TOTAL | ACTIONS |
|---|---|---|---|---|---|---|---|---|
| 2026-01-14 | gpt-4o / asl-formatted-v2 | asl-evals-section-a-closed | AI Judge | 46 (70%) | 14 (21%) | 6 (9%) | 66 | View Details |
| 2026-01-14 | gpt-4o / asl-formatted-v2 | asl-evals-section-a-closed | Human Review | 51 (77%) | 15 (23%) | 0 (0%) | 66 | View Details |
| 2026-01-13 | gpt-4o / asl-formatted-v2 | asl-evals-section-b-closed | AI Judge | 26 (70%) | 2 (5%) | 9 (24%) | 37 | View Details |
| 2026-01-13 | gpt-4o / asl-formatted-v2 | asl-evals-section-b-closed | Human Review | 34 (92%) | 3 (8%) | 0 (0%) | 37 | View Details |
Column Notes:
- Model: Base model / fine-tuned model name
- Eval: Evaluation file corresponding to a section of the rulebook