Hire for agency, not algorithms

Hire engineers who solve
your problems.

We recreate real tickets and PRs from your backlog as interviews — no sensitive data exposed. See every edit, every AI prompt, every decision — not just a score.

Join waitlist Try demo →

Candidate experience

canary|CAR-523·tmp dirs leaking after failed runs

37:42|▶ RunSubmit

Problem

CAR-523

PR #1098

CAR / Backlog / CAR-523

tmp dirs leaking after failed runs??? disk almost full again

● In Progress

INC-3201 — got paged, execution host at 94% disk. runs failing with No space left on device.

/tmp/canary/ has 3000+ dirs from old runs that were never deleted. seems worse on high-volume days.

Workspace

solution.py

tests.py

README.md

from sandbox import create_sandbox, write_files

from sandbox import run_tests, cleanup

def execute_run(req):

sandbox = create_sandbox(req["run_id"])

write_files(sandbox, req["code"])

result = run_tests(sandbox, req["timeout"])

cleanup(sandbox)

return result

Tests 2/5

✓successful run returns output

✓cleanup after success

✗cleanup after timeoutSandbox dir still exists

The technical interview, rebuilt for how engineering actually works.

Problems built from your backlog

Connect GitHub or Jira and we recreate a real ticket or PR as a sandboxed interview problem. Candidates get the full context — no sensitive data, no access to your actual codebase.

A full session replay, not a score

Every edit, every AI prompt, every dead end is recorded. Hiring teams see exactly how a candidate thinks — not just whether they got the right answer.

Measures agency, not memorization

An unfamiliar codebase, a vague ticket, no obvious root cause — the way real work shows up. Canary scores how a candidate navigates ambiguity, uses their tools, and drives to a fix: the agency that actually predicts on-the-job performance.

See a live problem →

Get early access.