Training-grade synthetic data, on demand.

Ore is a usage-based API that uses a recursive self-improving agent to generate structured tabular data. Define a schema. The agent generates, critiques, and refines — producing data your models will actually train on.

Get API key How it works

How it works

Define your schema

Declare field names, types, value distributions, and constraints. JSON or a future SDK.

Agent generates

The agent produces an initial batch of records that satisfy your schema.

Agent critiques

It audits its own output for statistical coherence, distributional drift, and constraint violations.

Agent refines

Failing records are regenerated and the critique loop repeats. Higher quality_passes = higher fidelity.

One JSON schema. Thousands of coherent rows.

Declare field types, value distributions, and constraints. Ore enforces them across every row — and then critiques the output against statistical coherence before delivering the dataset.

Tabular CSV, JSONL, and Parquet output
Log-normal, uniform, and categorical distributions
Referential integrity across related fields
Configurable quality passes — trade latency for fidelity

{
  "schema": {
    "fields": [
      { "name": "user_id",  "type": "uuid" },
      { "name": "age",      "type": "integer", "min": 18, "max": 80 },
      { "name": "plan",     "type": "enum",
        "values": ["free", "pro", "enterprise"],
        "weights": [0.70, 0.25, 0.05] },
      { "name": "ltv_usd",  "type": "float", "min": 0, "max": 5000,
        "distribution": "log-normal" }
    ]
  },
  "rows": 50000,
  "quality_passes": 3
}

Pricing

Pay for what you generate.

Free

10,000 rows / mo

No card required

Developer

$0.001 / row

Beyond free tier

Enterprise

Volume pricing

SLA · private deployment