chure — Angel Gabriel Peña

chure is an early TypeScript SDK for defining and running prompt-based LLM benchmarks through OpenRouter. It focuses on a small, direct workflow for writing evals, running them against one or more models, and inspecting the results.

What It Does

The SDK currently supports text-only model benchmarks with simple evaluator options and a path for custom evaluation logic. It can write benchmark results to JSON and includes a pretty-printer for reviewing output in the terminal.

Technical Implementation

Benchmark Definitions

Benchmarks are defined as typed TypeScript objects. Each benchmark can include a system prompt and a set of eval cases with prompts, expected answers, and evaluator rules.

Evaluators

The evaluator layer supports exact-match checks, substring inclusion checks, and custom function evaluators for cases that need project-specific scoring logic.

OpenRouter Integration

The runner accepts an OpenRouter API key and a model list, making it possible to compare multiple LLMs against the same benchmark set.

Key Features

Prompt-based benchmark definitions with TypeScript types
Multi-model runs through OpenRouter
Exact match, includes, and custom evaluators
JSON output for saving benchmark results
Pretty-printed summaries for terminal review