Benchmarks

Alien Giraffe benchmarks measure two things separately:

Request lifecycle performance: how fast the platform can create, approve, activate, and warm a request-scoped environment.
Query and load performance: how fast the active environment can execute dataset queries, including joins across datasets from different sources.

The latest artifacts referenced on this page come from the runs generated on March 30, 2026:

Request benchmark: artifacts/request-bench/request-20260331-021935
Load benchmark: artifacts/request-bench/load-20260331-021939

Benchmark design

Request lifecycle benchmark

The request benchmark creates a template-style request, approves it, activates it in benchmark mode, waits until the first query succeeds, and then records per-query timings.

The benchmark sequence is:

Obtain requester and admin credentials.
Build a request payload from all currently catalogued datasets.
Create the request.
Approve the request.
Activate the request-scoped environment with benchmark resources.
Poll until the first rendered query succeeds.
Execute the benchmark query set and persist the run report.

This makes the benchmark intentionally end-to-end. It does not only measure SQL execution. It includes request construction from the current catalog, approval, environment activation, warm-up, and the first usable query response.

Load test

The load benchmark reuses an already active request and runs concurrent query load against it. In the latest run it swept these concurrency levels:

1, 10, 25, 50, 100, 250, 500, 1000

Each concurrency level represents that many concurrent users issuing the full rendered query set against the same active request environment. For each level the benchmark records:

Total successful and failed queries
Average latency
P50 latency
P95 latency
Throughput in queries per second
Error rate

This load profile is meant to show saturation behavior rather than only best-case latency:

Low concurrency shows the base query cost once the environment is active.
Mid-range concurrency shows how quickly throughput scales up.
High concurrency shows when queueing and contention begin to dominate latency.
Error rate shows whether the system degrades by slowing down or by failing queries.

Teardown

After measurement, the benchmark request environment is short-lived and can be revoked and removed. That keeps the benchmark aligned with the same temporary-access model used by the platform itself.

Workload shape

The benchmark request is built over the currently approved and catalogued datasets that expose schema-backed columns. The latest request benchmark assembled a single request across:

3 datasources
3 datasets
19 columns

The latest query workload covered:

3 distinct query scopes
3 distinct datasets touched by the queries
0 cross-source query scopes
3 single-source query scopes

The broader request scope may include additional catalogued datasources, but the benchmark claims on this page are limited to the dataset scopes actually exercised by the rendered query workload.

That means the benchmark is not measuring a single connector in isolation. It is measuring end-to-end request provisioning over a mixed-source request scope and then executing SQL inside the active request environment.

This workload is deliberately small in query count and broad in system coverage:

It verifies single-source access to participating dataset families.
It verifies cross-source query execution inside the same temporary query surface.
It keeps the query text deterministic so that lifecycle and concurrency effects are easier to compare between runs.

Latest request benchmark results

Run timestamp: March 30, 2026 at 19:19 PDT

Request lifecycle

Stage	Result
Request create latency	6 ms
Request approve latency	14 ms
Environment ready latency	3084 ms
Startup to first successful query	3084 ms

Query timings

All query scopes completed successfully.

Metric	Result
Query scopes measured	3
Total benchmark queries	6
Average scope latency	3.83 ms
Median scope latency	4.00 ms
Success rate	100.00%
Failed queries	0

The main point from this run is not just low per-query latency. It is that the request became query-ready in about 3.1 seconds while spanning a broad mixed-source catalog, and the benchmark query set completed immediately once the environment was warm.

This splits the performance story into two phases:

Provisioning cost: request creation, approval, activation, and warm-up
Steady-state query cost: repeated SQL execution after the environment is ready

Latest load benchmark results

Run timestamp: March 30, 2026 at 19:19 PDT

The load test executed the same rendered queries at each concurrency level against the active benchmark request.

Concurrency	Avg latency	P95 latency	Throughput	Error rate
1	4.00 ms	4 ms	229.56 qps	0.00%
10	44.13 ms	150 ms	172.59 qps	0.00%
25	74.94 ms	207 ms	262.73 qps	0.00%
50	146.11 ms	248 ms	301.73 qps	0.00%
100	294.21 ms	572 ms	293.18 qps	0.00%
250	700.11 ms	1447 ms	305.26 qps	0.00%
500	1404.24 ms	3004 ms	301.19 qps	0.00%
1000	3239.71 ms	7097 ms	257.42 qps	0.00%

Key takeaways from the latest run:

Peak measured throughput was 305.26 qps at 250 concurrent users.
Error rate remained 0% through 1000 concurrent users.
The highest measured concurrency was 1000 users with 257.42 qps throughput.
Latency increased as concurrency rose, which shows the platform staying available under heavier load while moving into the saturation region.

Load graphs

What these results demonstrate

The current benchmark results show that Alien Giraffe can:

Build a request-scoped environment from a multi-source catalog automatically
Query datasets from different backends through one active request
Execute cross-source joins across those datasets when the approved scope allows it
Sustain high query throughput on the benchmark workload with stable error behavior in the latest run

That combination is the important capability: the platform is not just brokering isolated point queries, it is standing up a unified temporary query surface over approved datasets from multiple systems.