Private beta

Training-grade synthetic data, on demand.

Ore is a usage-based API that uses a recursive self-improving agent to generate structured tabular data. Define a schema. The agent generates, critiques, and refines — producing data your models will actually train on.

How it works

01

Define your schema

Declare field names, types, value distributions, and constraints. JSON or a future SDK.

02

Agent generates

The agent produces an initial batch of records that satisfy your schema.

03

Agent critiques

It audits its own output for statistical coherence, distributional drift, and constraint violations.

04

Agent refines

Failing records are regenerated and the critique loop repeats. Higher quality_passes = higher fidelity.

One JSON schema. Thousands of coherent rows.

Declare field types, value distributions, and constraints. Ore enforces them across every row — and then critiques the output against statistical coherence before delivering the dataset.

  • Tabular CSV, JSONL, and Parquet output
  • Log-normal, uniform, and categorical distributions
  • Referential integrity across related fields
  • Configurable quality passes — trade latency for fidelity
{
  "schema": {
    "fields": [
      { "name": "user_id",  "type": "uuid" },
      { "name": "age",      "type": "integer", "min": 18, "max": 80 },
      { "name": "plan",     "type": "enum",
        "values": ["free", "pro", "enterprise"],
        "weights": [0.70, 0.25, 0.05] },
      { "name": "ltv_usd",  "type": "float", "min": 0, "max": 5000,
        "distribution": "log-normal" }
    ]
  },
  "rows": 50000,
  "quality_passes": 3
}

Pricing

Pay for what you generate.

Free
10,000 rows / mo
No card required
Developer
$0.001 / row
Beyond free tier
Enterprise
Volume pricing
SLA · private deployment