chure
An early TypeScript SDK for defining and running text-based LLM benchmarks through OpenRouter.
chure is an early TypeScript SDK for defining and running prompt-based LLM benchmarks through OpenRouter. It focuses on a small, direct workflow for writing evals, running them against one or more models, and inspecting the results.
What It Does
The SDK currently supports text-only model benchmarks with simple evaluator options and a path for custom evaluation logic. It can write benchmark results to JSON and includes a pretty-printer for reviewing output in the terminal.
Technical Implementation
Benchmark Definitions
Benchmarks are defined as typed TypeScript objects. Each benchmark can include a system prompt and a set of eval cases with prompts, expected answers, and evaluator rules.
Evaluators
The evaluator layer supports exact-match checks, substring inclusion checks, and custom function evaluators for cases that need project-specific scoring logic.
OpenRouter Integration
The runner accepts an OpenRouter API key and a model list, making it possible to compare multiple LLMs against the same benchmark set.
Key Features
- Prompt-based benchmark definitions with TypeScript types
- Multi-model runs through OpenRouter
- Exact match, includes, and custom evaluators
- JSON output for saving benchmark results
- Pretty-printed summaries for terminal review